RNN Time Series Prediction - Complete Guide

RNN Time Series Prediction

A Comprehensive Guide to Sequence Forecasting with Recurrent Neural Networks

Introduction

Recurrent Neural Networks (RNNs) are a class of neural networks designed to work with sequential data. Unlike traditional feedforward networks, RNNs maintain a "memory" of previous inputs through hidden states, making them particularly effective for time series forecasting.

Key Concepts

Sequential Processing: RNNs process data points in sequence, maintaining information about previous steps
Hidden State: Internal memory that captures information about previous inputs
Time Unrolling: The process of visualizing RNNs across time steps
Backpropagation Through Time (BPTT): The training algorithm for RNNs

Applications

Stock price prediction
Weather forecasting
Energy demand prediction
Sensor data analysis
Economic indicators forecasting

Note: While this guide focuses on basic RNNs, the same principles apply to more advanced architectures like LSTMs and GRUs, which are better at capturing long-term dependencies.

Implementation

Data Preparation

Model Architecture

Training Process

Evaluation

1. Data Preparation

Time series data requires special preprocessing to create sequences suitable for RNN training.

import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader

def generate_sine_wave(seq_len, num_samples=1000):
    """Generate sine wave time series data with sequences."""
    x = np.linspace(0, 100, num_samples)
    y = np.sin(x)
    
    sequences = []
    targets = []
    for i in range(len(y) - seq_len):
        sequences.append(y[i:i+seq_len])
        targets.append(y[i+seq_len])
    
    return np.array(sequences), np.array(targets)

class TimeSeriesDataset(Dataset):
    """Custom Dataset for time series data"""
    def __init__(self, sequences, targets):
        self.sequences = torch.FloatTensor(sequences).unsqueeze(-1)
        self.targets = torch.FloatTensor(targets).unsqueeze(-1)
        
    def __len__(self):
        return len(self.sequences)
    
    def __getitem__(self, idx):
        return self.sequences[idx], self.targets[idx]

Key Steps

Generate or load time series data
Create sliding window sequences
Split into input sequences and target values
Convert to PyTorch tensors
Create Dataset and DataLoader for batching

Parameters

seq_len: Length of input sequences (time steps)
num_samples: Total number of data points
batch_size: Number of sequences per training batch

Data Visualization

Sequence Length: 50

2. Model Architecture

The RNN model consists of an embedding layer (for discrete inputs), recurrent layers, and output layers.

import torch.nn as nn

class RNNModel(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, num_layers=1):
        super(RNNModel, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        
        # Recurrent layer
        self.rnn = nn.RNN(
            input_size=input_size,
            hidden_size=hidden_size,
            num_layers=num_layers,
            batch_first=True
        )
        
        # Output layer
        self.fc = nn.Linear(hidden_size, output_size)
    
    def forward(self, x, hidden=None):
        # Initialize hidden state if not provided
        if hidden is None:
            hidden = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(x.device)
        
        # Forward pass through RNN
        out, hidden = self.rnn(x, hidden)
        
        # Reshape output for fully connected layer
        out = out.contiguous().view(-1, self.hidden_size)
        out = self.fc(out)
        
        return out, hidden

Components

RNN Layer: Processes sequential data with hidden state
Linear Layer: Maps final hidden state to output
Hidden State: Maintains memory between time steps

Hyperparameters

input_size: Dimension of input features (1 for univariate)
hidden_size: Number of units in hidden state
output_size: Dimension of output (1 for regression)
num_layers: Stacked RNN layers (default 1)

For better performance on long sequences, consider using LSTM (nn.LSTM) or GRU (nn.GRU) layers which are better at capturing long-range dependencies.

3. Training Process

The training loop involves forward passes, loss calculation, and backpropagation through time.

# Initialize model, loss function, and optimizer
model = RNNModel(input_size=1, hidden_size=32, output_size=1)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 100
for epoch in range(num_epochs):
    model.train()
    total_loss = 0
    
    for batch_x, batch_y in train_loader:
        # Zero gradients
        optimizer.zero_grad()
        
        # Forward pass
        outputs, _ = model(batch_x)
        loss = criterion(outputs, batch_y.view(-1, 1))
        
        # Backward pass and optimize
        loss.backward()
        optimizer.step()
        
        total_loss += loss.item()
    
    # Print training progress
    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {total_loss/len(train_loader):.4f}')

Training Components

Loss Function: Mean Squared Error (MSE) for regression
Optimizer: Adam optimizer with learning rate
Epochs: Complete passes through the dataset
Batches: Subsets of data for gradient updates

Common Issues

Vanishing/exploding gradients
Overfitting on training data
Difficulty learning long-term dependencies
Sensitivity to hyperparameters

Training Visualization

Hidden Size: Learning Rate:

4. Model Evaluation

After training, we evaluate the model on test data to assess its generalization ability.

def evaluate(model, test_sequences, test_targets, seq_len):
    model.eval()
    with torch.no_grad():
        # Convert to tensor
        test_seq = torch.FloatTensor(test_sequences).unsqueeze(-1)
        
        # Make prediction
        predictions, _ = model(test_seq)
        predictions = predictions.view(-1).numpy()
        
        # Plot results
        plt.figure(figsize=(12, 6))
        plt.plot(np.arange(len(test_targets)), test_targets, label='True Values')
        plt.plot(np.arange(len(predictions)), predictions, 'ro', label='Predictions')
        plt.legend()
        plt.title('Model Predictions vs Actual Values')
        plt.show()
        
        return predictions

Evaluation Metrics

Mean Squared Error (MSE): Standard regression metric
Mean Absolute Error (MAE): Robust to outliers
R-squared: Variance explained by model
Visual Inspection: Plot predictions vs actual

Prediction Strategy

Single-step: Predict next value only
Multi-step: Predict multiple future values
Recursive: Use predictions as new inputs
Direct: Separate model for each future step

Prediction Demo

Starting Point: 800 Prediction Steps:

Advanced Topics

LSTM & GRU

Attention

Multivariate

Production

LSTM and GRU Architectures

Advanced RNN variants that address the vanishing gradient problem.

LSTM (Long Short-Term Memory)

class LSTMModel(nn.Module):
    def __init__(self, input_size, hidden_size):
        super().__init__()
        self.lstm = nn.LSTM(input_size, hidden_size)
        self.fc = nn.Linear(hidden_size, 1)
    
    def forward(self, x):
        out, (h_n, c_n) = self.lstm(x)
        return self.fc(out[-1])

Key Features:

Input, output, and forget gates
Cell state for long-term memory
Better at learning long-range dependencies

GRU (Gated Recurrent Unit)

class GRUModel(nn.Module):
    def __init__(self, input_size, hidden_size):
        super().__init__()
        self.gru = nn.GRU(input_size, hidden_size)
        self.fc = nn.Linear(hidden_size, 1)
    
    def forward(self, x):
        out, h_n = self.gru(x)
        return self.fc(out[-1])

Key Features:

Update and reset gates
Simpler than LSTM but often comparable
Fewer parameters than LSTM

Attention Mechanisms

Allows the model to focus on relevant parts of the input sequence.

class Attention(nn.Module):
    def __init__(self, hidden_size):
        super().__init__()
        self.attention = nn.Linear(hidden_size, 1)
    
    def forward(self, rnn_output):
        # rnn_output shape: (seq_len, batch, hidden_size)
        attention_weights = torch.softmax(
            self.attention(rnn_output), dim=0
        )
        return (attention_weights * rnn_output).sum(dim=0)

Benefits of Attention:

Interpretability (can see which time steps are important)
Better performance on long sequences
Flexibility in focusing on relevant parts of history

Multivariate Time Series

Extending the model to handle multiple input features.

# Multivariate data shape: (num_samples, seq_len, num_features)
multivariate_data = np.stack([feature1, feature2, feature3], axis=-1)

class MultivariateRNN(nn.Module):
    def __init__(self, input_size, hidden_size):
        super().__init__()
        self.rnn = nn.RNN(input_size, hidden_size)
        self.fc = nn.Linear(hidden_size, 1)
    
    def forward(self, x):
        out, h_n = self.rnn(x)
        return self.fc(out[-1])

Considerations:

Input size becomes number of features
May need more hidden units
Feature scaling becomes more important
Can capture cross-feature dependencies

Production Considerations

Deploying time series models in real-world applications.

Deployment Options

TorchScript for optimized inference
ONNX runtime for cross-platform
Flask/FastAPI web services
Cloud functions (AWS Lambda, GCP Functions)

Monitoring

Track prediction drift over time
Monitor input data distribution
Set up alerting for anomalies
Periodic model retraining

Additional Resources

Learning Resources

Useful Libraries

sktime - Time series machine learning
Prophet - Forecasting at scale
TensorFlow - Alternative to PyTorch

code exercises

Recurrent Neural Network(RNN) Time Series Prediction With PyTorch