Deep Temporal Forecasting with Recurrent Neural Networks (RNNs)

Forecasting time series

Here, we focus on using Recurrent Neural Networks or RNNs for forecasting time series. Specifically, we use Python and TensorFlow to implement an RNN capable of forecasting monthly airline passengers 12-months into the future with ~97% accuracy:

Jupyter Notebook: Recurrent Neural Networks (RNNs) for Time Series Prediction

Recurrent neural networks – big picture

While we consider the problem of forecasting monthly airline passengers, our goal is general-purpose time series prediction. Towards this goal, we look at RNNs. RNNs’ architectures, as Olah (2015) notes, are naturally well suited for time series data. Unlike traditional Artificial Neural Networks or ANNs, which learn from scalar inputs, RNNs learn from sequences of scalar inputs. As Géron (2017) notes, a recurrent neuron learns sequences by looping its output back to itself at the next time step or frame. This output and the input at the next time step are received by the recurrent neuron and contribute to its activation. Such recurrent learning is what gives RNNs their so-called memory and of course their name.

Unlike traditional Artificial Neural Networks or ANNs, which learn from scalar inputs, RNNs learn from sequences of scalar inputs.

Importing, formatting, and visualizing data

Our data is Box, Jenkins, and Reinsel’s (1976) AirPassengers data set, which can be downloaded here. Data consists of two comma separated columns, Month and #Passengers. Month is the month of the year given in YYYY-MM format and #Passengers is the number of passengers, in thousands, that traveled in a particular month. Data ranges from January 1949 to December 1960.

Import Pandas, Numpy, Matplotlib, and TensorFlow.

# Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

Read in data, format dates, and sort ascending by date.

# Import data, format dates, sort data ascending by date
data = pd.read_csv(
data['date'] = pd.to_datetime(data['date'])
data = data.sort_values(by=['date'], ascending=True)

# Store date and no_passengers columns in numpy style arrays
dates = data['date'].as_matrix()
nos_passengers = data['no_passengers'].as_matrix()

Plot monthly airline passengers as a function of time.  This is what we aim to predict.

# Plot monthly airline passengers
plt.plot(dates, nos_passengers)
plt.title('Monthly airline passengers', fontsize=18)
plt.xlabel('Date', fontsize=18)
plt.ylabel('No. of passengers [thousands]', fontsize=18)

Split data into training and validation sets

We aim to forecast monthly airline passengers 12-months into the future. So we separate out our last 12-months of data from our first 132-months of data so they can be used to validate our RNN, which will be trained on the first 132-months of data only.

# Number of steps (months) into the future to predict
n_pred_steps = 12

# Split data into training and validation sets
dates_train = dates[0:len(dates)-n_pred_steps]
nos_passengers_train = nos_passengers[0:len(nos_passengers)-n_pred_steps]

dates_valid = dates[len(dates)-n_pred_steps:len(dates)]
nos_passengers_valid = nos_passengers[len(nos_passengers)-n_pred_steps:len(nos_passengers)]

Redefining the problem

Since our data has a general upward trend, new sequences of inputs used to generate predictions may, at times, be out-of-range of old sequences of inputs used to train the network. This out-of-range issue is tantamount to extrapolation, which is bad, and affects any time series at or near its all time high or all time low.

To remedy this issue, we redefine the problem. Instead of forecasting monthly airline passengers, we predict the percent change in monthly airline passengers from the previous month. Note: we assume no percent change in monthly airline passengers from the previous month at the first month.

# Compute percent change from previous month to current month
previous_train = nos_passengers_train[0:len(nos_passengers_train)-1]
current_train = nos_passengers_train[1:len(nos_passengers_train)]
changes_train = ((current_train - previous_train) / previous_train) * 100

# Assume no percent change from previous month at first month
changes_train = np.insert(changes_train, 0, 0)

Plot monthly airline passengers percent change.

# Plot monthly airline passengers % change
plt.plot(dates_train, changes_train)
plt.title('Monthly airline passengers % change', fontsize=18)
plt.xlabel('Date', fontsize=18)
plt.ylabel('% Change', fontsize=18)

As we see in the above plot, the upward trend has been removed.  Now we can proceed with scaling.

Scaling redefined problem

Machine learning algorithms perform best when data is scaled appropriately. Typically, scaling of inputs, or feature scaling, is done using normalization or standardization. Normalization, or min-max scaling, usually involves scaling inputs to the range 0 to 1. Standardization involves subtracting out the mean and dividing by the standard deviation so that inputs have zero mean and unit variance. In general, normalization is best when inputs do not contain outliers and standardization is best when inputs do contain outliers. However, for some machine learning algorithms, it may be preferable to use one technique or the other.

From the “Monthly airline passengers % change” plot above, it is apparent that our input does not contain outliers. Therefore, we use normalization.

# Normalize percent changes
changes_train_min = min(changes_train)
changes_train_max = max(changes_train)
changes_train_normalized = (changes_train - changes_train_min) / (changes_train_max - changes_train_min)

Plot monthly airline passengers normalized percent change.

# Plot monthly airline passengers normalized % change
plt.plot(dates_train, changes_train_normalized)
plt.title('Monthly airline passengers normalized % change', fontsize=18)
plt.xlabel('Date', fontsize=18)
plt.ylabel('Normalized % Change', fontsize=18)

As we see in the above plot, percent changes have been normalized between 0 and 1.  This completes preprocessing of our data and we may now move on to constructing our RNN.

Hyperparameter initialization

Initialize graph, batch, and optimizer parameters.

# Initialize graph parameters
n_inputs = 1             # no. of nodes in input layer
n_hidden = 100           # no. of nodes in hidden layer
n_outputs = 1            # no. of nodes in output layer

# Initialize batch parameters
batch_size = 50          # no. of training examples to consider at each training iteration
n_steps = 12             # no. of steps in each input sequence

# Initialize optimizer parameters
n_iterations = 10000     # no. of training iterations to perform
learning_rate = 0.001    # learning rate for training

Constructing the computation graph

Reset the default graph.


Define placeholder nodes for batches of training inputs and outputs, respectively, to occupy.

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])
y = tf.placeholder(tf.float32, [None, n_steps, n_outputs])

Define RNN cells and a dynamic RNN, which performs fully dynamic unrolling of inputs.

cell = tf.contrib.rnn.BasicRNNCell(num_units=n_hidden, activation=tf.nn.relu)
rnn_outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)

Stack all the outputs by reshaping from [batch_size, n_steps, n_hidden] to [batch_size * n_steps, n_hidden].

stacked_rnn_outputs = tf.reshape(rnn_outputs, [-1, n_hidden])

Project output by applying a single fully connected layer.

stacked_outputs = tf.layers.dense(stacked_rnn_outputs, n_outputs)

Unstack projection by reshaping from [batch_size * n_steps, n_outputs] to [batch_size, n_steps, n_outputs].

outputs = tf.reshape(stacked_outputs, [-1, n_steps, n_outputs])

Define MSE loss function, an Adam optimizer, and the training operation.

loss = tf.reduce_mean(tf.square(outputs - y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)

Create node in graph that will initialize all variables when run.

init = tf.global_variables_initializer()

Create a Saver node for saving and restoring model.

saver = tf.train.Saver()

Generating batches

Define a function to generate batches of training inputs and outputs at each training iteration.

def next_batch(input_sequence, batch_size, n_steps):
    i_first = 0
    i_last = len(input_sequence)
    i_starts = np.random.randint(i_first, high=i_last-n_steps, size=(batch_size, 1))
    i_sequences = i_starts + np.arange(0, n_steps + 1)
    flat_i_sequences = np.ravel(i_sequences[:,:])
    flat_sequences = input_sequence[flat_i_sequences]
    sequences = flat_sequences.reshape(batch_size,-1)
    return sequences[:, :-1].reshape(-1, n_steps, 1), sequences[:, 1:].reshape(-1, n_steps, 1)

Train RNN

with tf.Session() as sess:
    for iteration in range(n_iterations):
        X_batch, y_batch = next_batch(changes_train_normalized, batch_size, n_steps), feed_dict={X: X_batch, y: y_batch})
        if iteration % 1000 == 0:
            mse = loss.eval(feed_dict={X: X_batch, y: y_batch})
            print(iteration, "\tMSE:", mse)
       , "./AirPassengersModel")

0 MSE: 0.335494
1000 MSE: 0.0139177
2000 MSE: 0.0117318
3000 MSE: 0.00861321
4000 MSE: 0.00982438
5000 MSE: 0.00805407
6000 MSE: 0.00804817
7000 MSE: 0.00883125
8000 MSE: 0.00727863
9000 MSE: 0.0068532

Generate predictions

changes_train_normalized_working = changes_train_normalized
changes_pred_normalized = np.array([])

with tf.Session() as sess:
    saver.restore(sess, "./AirPassengersModel")
    for pred in range(n_pred_steps):
        flat_X_new = changes_train_normalized_working[len(changes_train_normalized_working)-n_steps:len(changes_train_normalized_working)]
        X_new = flat_X_new.reshape(1,-1).reshape(-1, n_steps, 1) 
        y_pred =, feed_dict={X: X_new})
        changes_pred_normalized = np.append(changes_pred_normalized, [y_pred[0, n_steps-1, 0]], axis=0)
        changes_train_normalized_working = np.append(changes_train_normalized, changes_pred_normalized, axis=0)

Denormalize predictions

changes_pred = changes_pred_normalized * (changes_train_max - changes_train_min) + changes_train_min

Put predictions in terms of original problem

nos_passengers_pred = np.array([nos_passengers_train[len(nos_passengers_train)-1]])
for pred in range(n_pred_steps):
    nos_passengers_pred = np.append(nos_passengers_pred, [nos_passengers_pred[pred] + nos_passengers_pred[pred] * (changes_pred[pred] / 100)], axis=0)
nos_passengers_pred = np.delete(nos_passengers_pred, 0)
# Plot monthly airline passengers (validation vs prediction)
plt.plot(dates_valid, nos_passengers_valid, '-', label="actual")
plt.plot(dates_valid, nos_passengers_pred, '--', label="predicted")
plt.title('Monthly airline passengers (actual vs predicted)', fontsize=18)
plt.xlabel('Month', fontsize=18)
plt.ylabel('No. of passengers [thousands]', fontsize=18)
plt.xticks(dates_valid, ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'], rotation='horizontal', fontsize=16)
plt.legend(loc="upper left", fontsize=16)

Calculate average percent error

percent_error = np.absolute((nos_passengers_pred - nos_passengers_valid) / nos_passengers_valid) * 100



Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (1976). Time Series Analysis, Forecasting and Control (3rd ed.). San Francisco, CA: Holden-Day.

Géron, A. (2017). Hands-On Machine Learning with Scikit-Learn & TensorFlow. Sebastopol, CA: O’Reilly Media, Inc.

Olah, C. (2015). Understanding LSTM Networks. Colah’s Blog.

Leave a Reply

Your email address will not be published. Required fields are marked *