Modelling time series trend with tree based models

If you like  Skforecast ,  please give us a star on   GitHub! ⭐️

Modelling time series trend with tree based models

Joaquín Amat Rodrigo, Javier Escobar Ortiz
September, 2023

Introduction

Tree-based models, including decision trees, random forests and gradient boosting machines (GBMs), are known for their effectiveness and widespread use in various machine learning applications. However, they have limitations when it comes to extrapolation, i.e., making predictions or estimates beyond the range of observed data. This limitation becomes particularly critical when forecasting time-series data with a trend. Because these models lack the ability to predict values beyond the observed range during training, their predicted values will deviate from the underlying trend.

Several strategies have been proposed to address this challenge, with one of the most frequently used techniques being differentiation. This process involves calculating the differences between successive observations in the time series. Rather than modeling the absolute values, the focus shifts to modeling the relative change ratios. After estimating the predictions, the transformation can be reversed to recover the values in their initial scale.

The skforecast library, version 0.10.0 or higher, introduces a novel differentiation parameter within its forecaster classes to indicate that a differentiation process must be applied before training the model. This is achieved by making internal use of a new transformer named skforecast.preprocessing.TimeSeriesDifferentiator. It should be noted that the differentiation process has been fully automated and its effects are reversed during the prediction phase, ensuring that the forecast values are in the same scale as the original time series data.

This document shows how differentiation can be used to model time series with a positive trend using tree-based models (random forest and a gradient boosting xgboost).

Libraries

In [17]:
# Data manipulation
# ==============================================================================
import numpy as np
import pandas as pd

# Plots
# ==============================================================================
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-darkgrid')

# Modelling and Forecasting
# ==============================================================================
from xgboost import XGBRegressor
from sklearn.ensemble import RandomForestRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.model_selection import backtesting_forecaster
from skforecast.preprocessing import TimeSeriesDifferentiator
from sklearn.metrics import mean_absolute_error

Data

The dataset consists of monthly totals of international air passengers from 1949 to 1960.

In [18]:
# Download data
# ==============================================================================
url = (
    'https://raw.githubusercontent.com/JoaquinAmatRodrigo/Estadistica-machine-learning-python/'
    'master/data/AirPassengers.csv'
)
data = pd.read_csv(url, sep=',')

# Data preprocessing
# ==============================================================================
data['Date'] = pd.to_datetime(data['Date'], format='%Y-%m')
data = data.set_index('Date')
data = data.asfreq('MS')
data = data['Passengers']
data = data.sort_index()
data.head(4)
Out[18]:
Date
1949-01-01    112
1949-02-01    118
1949-03-01    132
1949-04-01    129
Freq: MS, Name: Passengers, dtype: int64

The same data is stored but applying a differentiation of order 1 using the TimeSeriesDifferentiator.

In [19]:
# Data differentiated
# ==============================================================================
diferenciator = TimeSeriesDifferentiator(order=1)
data_diff = diferenciator.fit_transform(data)
data_diff = pd.Series(data_diff, index=data.index).dropna()
data_diff.head(4)
Out[19]:
Date
1949-02-01     6.0
1949-03-01    14.0
1949-04-01    -3.0
1949-05-01    -8.0
Freq: MS, dtype: float64
In [20]:
# Data partition train-test
# ==============================================================================
end_train = '1955-12-01 23:59:59'
print(
    f"Train dates : {data.index.min()} --- {data.loc[:end_train].index.max()}  " 
    f"(n={len(data.loc[:end_train])})")
print(
    f"Test dates  : {data.loc[end_train:].index.min()} --- {data.index.max()}  "
    f"(n={len(data.loc[end_train:])})")

# Plot
# ==============================================================================
fig, axs = plt.subplots(1, 2, figsize=(11, 2.5))
axs = axs.ravel()
data.loc[:end_train].plot(ax=axs[0], label='train')
data.loc[end_train:].plot(ax=axs[0], label='test')
axs[0].legend()
axs[0].set_title('Original data')

data_diff.loc[:end_train].plot(ax=axs[1], label='train')
data_diff.loc[end_train:].plot(ax=axs[1], label='test')
axs[1].legend()
axs[1].set_title('Differentiated data');
Train dates : 1949-01-01 00:00:00 --- 1955-12-01 00:00:00  (n=84)
Test dates  : 1956-01-01 00:00:00 --- 1960-12-01 00:00:00  (n=60)

Forecasting with Random Forest and Gradient Boosting

Two autoregressive forecasters are created, one with a scikit-learn RandomForestRegressor and the other with an XGBoost. Both are trained on data from 1949-01-01 to 1955-12-01 and produce forecasts for the next 60 months (5 years).

In [21]:
# Forecasting without differentiation
# ==============================================================================
steps = len(data.loc[end_train:])

# Forecasters
forecaster_rf = ForecasterAutoreg(
                    regressor = RandomForestRegressor(random_state=963),
                    lags      = 12
                )
forecaster_gb = ForecasterAutoreg(
                    regressor = XGBRegressor(random_state=963),
                    lags      = 12
                )

# Train
forecaster_rf.fit(data.loc[:end_train])
forecaster_gb.fit(data.loc[:end_train])

# Predict
predictions_rf = forecaster_rf.predict(steps=steps)
predictions_gb = forecaster_gb.predict(steps=steps)

# Error
error_rf = mean_absolute_error(data.loc[end_train:], predictions_rf)
error_gb = mean_absolute_error(data.loc[end_train:], predictions_gb)
print(f"Error (MAE) Random Forest: {error_rf:.2f}")
print(f"Error (MAE) Gradient Boosting: {error_gb:.2f}")

# Plot
fig, ax = plt.subplots(figsize=(7, 3), sharex=True, sharey=True)
data.loc[:end_train].plot(ax=ax, label='train')
data.loc[end_train:].plot(ax=ax, label='test')
predictions_rf.plot(ax=ax, label='Random Forest')
predictions_gb.plot(ax=ax, label='Gradient Boosting')
ax.set_title(f'Forecasting without differentiation')
ax.set_xlabel('')
ax.legend();
Error (MAE) Random Forest: 66.10
Error (MAE) Gradient Boosting: 54.81

The plot shows that none of the models is capable of accurately predicting the trend. After a few steps, the predictions become nearly constant, close to the maximum values observed in the training data.

Next, two new forecasters are trained using the same configuration, but with the argument differentiation = 1. This activates the internal process of differencing (order 1) the time series before training the model, and reverses the differentiation (also known as integration) for the predicted values.

In [22]:
# Forecasting with differentiation
# ==============================================================================
steps = len(data.loc[end_train:])

# Forecasters
forecaster_rf = ForecasterAutoreg(
                    regressor       = RandomForestRegressor(random_state=963),
                    lags            = 12,
                    differentiation = 1
                )
forecaster_gb = ForecasterAutoreg(
                    regressor       = XGBRegressor(random_state=963),
                    lags            = 12,
                    differentiation = 1
                )

# Train
forecaster_rf.fit(data.loc[:end_train])
forecaster_gb.fit(data.loc[:end_train])

# Predict
predictions_rf = forecaster_rf.predict(steps=steps)
predictions_gb = forecaster_gb.predict(steps=steps)

# Error
error_rf = mean_absolute_error(data.loc[end_train:], predictions_rf)
error_gb = mean_absolute_error(data.loc[end_train:], predictions_gb)
print(f"Error (MAE) Random Forest: {error_rf:.2f}")
print(f"Error (MAE) Gradient Boosting: {error_gb:.2f}")

# Plot
fig, ax = plt.subplots(figsize=(7, 3), sharex=True, sharey=True)
data.loc[:end_train].plot(ax=ax, label='train')
data.loc[end_train:].plot(ax=ax, label='test')
predictions_rf.plot(ax=ax, label='Random Forest')
predictions_gb.plot(ax=ax, label='Gradient Boosting')
ax.set_title(f'Forecasting with differentiation')
ax.set_xlabel('')
ax.legend();
Error (MAE) Random Forest: 53.76
Error (MAE) Gradient Boosting: 29.16

This time, both models are able to follow the trend in their predictions.

Deep dive in differencing time series

The previous example showed how easy it is to introduce differentiation into the forecasting process thanks to the functionalities available in skforecast. However, several non-trivial transformations have to be applied in order to achieve a smooth interaction.

In the next sections, the capabilities of the transformer TimeSeriesDifferentiator are introduced:

  • Differentiation and integration (reverse differentiation) of any given time series.

  • Why managing the differentiation internally has advantages over the traditional approach of pre-transforming the entire time series before initiating the model training.

  • How to manage the differentiation when applying the Forecaster to new data that does not immediately follow the training data.

TimeSeriesDifferentiator

TimeSeriesDifferentiator is a custom transformer that follows the preprocessing sklearn API. This means it has the method fit, transform, fit_transform and inverse_transform.

In [23]:
# Differentiation with TimeSeriesDifferentiator
# ==============================================================================
y = np.array([5, 8, 12, 10, 14, 17, 21, 19], dtype=float)
diffenciator = TimeSeriesDifferentiator()
diffenciator.fit(y)
y_diff = diffenciator.transform(y)

print(f"Original time series   : {y}")
print(f"Differenced time series: {y_diff}")
Original time series   : [ 5.  8. 12. 10. 14. 17. 21. 19.]
Differenced time series: [nan  3.  4. -2.  4.  3.  4. -2.]

The process of differencing can be reversed (integration) using the inverse_transform method.

In [24]:
# Inverse transform
# ==============================================================================
diffenciator.inverse_transform(y_diff)
Out[24]:
array([ 5.,  8., 12., 10., 14., 17., 21., 19.])

Warning

The inverse transformation process, inverse_transform, is applicable only to the same time series that was previously differentiated using the same TimeSeriesDifferentiator object. This limitation arises from the need to use the initial n values of the time series (n equals the order of differentiation) to successfully reverse the differentiation. These values are stored when the fit method is executed.


  Note

An additional method inverse_transform_next_window is available in the TimeSeriesDifferentiator. This method is designed to be used inside the Forecasters to reverse the differentiation of the predicted values. If the Forecaster regressor is trained with a differentiated time series, then the predicted values will be differentiated as well. The inverse_transform_next_window method allows to return the predictions to the original scale, with the assumption that they start immediately after the last values observed (last_window).

Internal differentiation vs pre-processing

Forecasters manage the differentiation process internally, so there is no need for additional pre-processing of the time series and post-processing of the predictions. This has several advantages, but before diving in, the results of both approaches are compared.

In [25]:
# Time series differentiated by preprocessing before training
# ==============================================================================
diferenciator = TimeSeriesDifferentiator(order=1)
data_diff = diferenciator.fit_transform(data)
data_diff = pd.Series(data_diff, index=data.index).dropna()

forecaster = ForecasterAutoreg(
                 regressor = RandomForestRegressor(random_state=963),
                 lags      = 15
             )
forecaster.fit(y=data_diff.loc[:end_train])
predictions_diff = forecaster.predict(steps=steps)

# Revert differentiation to obtain final predictions
last_value_train = data.loc[:end_train].iloc[[-1]]
predictions_1 = pd.concat([last_value_train, predictions_diff]).cumsum()[1:]
predictions_1 = predictions_1.asfreq('MS')
predictions_1.name = 'pred'
predictions_1.head(5)
Out[25]:
1956-01-01    303.18
1956-02-01    293.70
1956-03-01    322.68
1956-04-01    326.52
1956-05-01    326.79
Freq: MS, Name: pred, dtype: float64
In [26]:
# Time series differentiated internally by the forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor       = RandomForestRegressor(random_state=963),
                 lags            = 15,
                 differentiation = 1
             )
forecaster.fit(y=data.loc[:end_train])
predictions_2 = forecaster.predict(steps=steps)
predictions_2.head(5)
Out[26]:
1956-01-01    303.18
1956-02-01    293.70
1956-03-01    322.68
1956-04-01    326.52
1956-05-01    326.79
Freq: MS, Name: pred, dtype: float64
In [27]:
# Compare both predictions
# ==============================================================================
pd.testing.assert_series_equal(predictions_1, predictions_2)

Next, the outcomes of the backtesting process are subjected to a comparative analysis. This comparison is more complex than the previous one, as the process of undoing the differentiation must be performed separately for each backtesting fold.

In [28]:
# Backtesting with the time series differentiated by preprocessing before training
# ==============================================================================
steps = 5
forecaster_1 = ForecasterAutoreg(
                   regressor = RandomForestRegressor(random_state=963),
                   lags      = 15
               )

_, predictions_1 = backtesting_forecaster(
                       forecaster            = forecaster_1,
                       y                     = data_diff,
                       steps                 = steps,
                       metric                = 'mean_squared_error',
                       initial_train_size    = len(data_diff.loc[:end_train]),
                       fixed_train_size      = False,
                       gap                   = 0,
                       allow_incomplete_fold = True,
                       refit                 = True,
                       n_jobs                = 'auto',
                       verbose               = False,
                       show_progress         = True  
                   )

# Revert differentiation of predictions. Predictions of each fold must be reverted
# individually. An id is added to each prediction to identify the fold to which it belongs.
predictions_1 = predictions_1.rename(columns={'pred': 'pred_diff'})
folds = len(predictions_1) / steps
folds = int(np.ceil(folds))
predictions_1['backtesting_fold_id'] = np.repeat(range(folds), steps)[:len(predictions_1)]

# Add the previously observed value of the time series (only to the first prediction of each fold)
previous_overved_values = data.shift(1).loc[predictions_1.index].iloc[::steps]
previous_overved_values.name = 'previous_overved_value'
predictions_1 = predictions_1.merge(
                    previous_overved_values,
                    left_index  = True,
                    right_index = True,
                    how         = 'left'
                )
predictions_1 = predictions_1.fillna(0)
predictions_1['summed_value'] = (
    predictions_1['pred_diff'] + predictions_1['previous_overved_value']
)

# Revert differentiation using the cumulative sum by fold
predictions_1['pred'] = (
    predictions_1
    .groupby('backtesting_fold_id')
    .apply(lambda x: x['summed_value'].cumsum())
    .to_numpy()
)

predictions_1.head(5)
Out[28]:
pred_diff backtesting_fold_id previous_overved_value summed_value pred
1956-01-01 25.18 0 278.0 303.18 303.18
1956-02-01 -9.48 0 0.0 -9.48 293.70
1956-03-01 28.98 0 0.0 28.98 322.68
1956-04-01 3.84 0 0.0 3.84 326.52
1956-05-01 0.27 0 0.0 0.27 326.79
In [29]:
# Backtesting with the time series differentiated internally
# ==============================================================================
forecaster_2 = ForecasterAutoreg(
                   regressor       = RandomForestRegressor(random_state=963),
                   lags            = 15,
                   differentiation = 1
               )

_, predictions_2 = backtesting_forecaster(
                       forecaster            = forecaster_2,
                       y                     = data,
                       steps                 = steps,
                       metric                = 'mean_squared_error',
                       initial_train_size    = len(data.loc[:end_train]),
                       fixed_train_size      = False,
                       gap                   = 0,
                       allow_incomplete_fold = True,
                       refit                 = True,
                       n_jobs                = 'auto',
                       verbose               = False,
                       show_progress         = True  
                   )

predictions_2.head(5)
Out[29]:
pred
1956-01-01 303.18
1956-02-01 293.70
1956-03-01 322.68
1956-04-01 326.52
1956-05-01 326.79
In [30]:
# Compare both predictions
# ==============================================================================
pd.testing.assert_series_equal(predictions_1['pred'], predictions_2['pred'])

If, as demonstrated, the values are equivalent when differentiating the time series in a preprocessing step or when allowing the Forecaster to manage the differentiation internally, why the second alternative is better?

  • Allowing the forecaster to manage all transformations internally guarantees that the same transformations are applied when the model is run on new data.

  • When the model is applied to new data that does not follow immediately after the training data (for example, if a model is not retrained for each prediction phase), the forecaster automatically increases the size of the last window needed to generate the predictors, as well as applying the differentiation to the incoming data and undoing it in the final predictions.

These transformations are non-trivial and very error-prone, so skforecast tries to avoid overcomplicating the already challenging task of forecasting time series.

Session information

In [31]:
import session_info
session_info.show(html=False)
-----
matplotlib          3.7.2
numpy               1.25.2
pandas              2.0.3
session_info        1.0.0
skforecast          0.10.0
sklearn             1.3.0
xgboost             1.7.6
-----
IPython             8.14.0
jupyter_client      8.3.0
jupyter_core        5.3.1
-----
Python 3.11.4 (main, Jul  5 2023, 13:45:01) [GCC 11.2.0]
Linux-5.15.0-1044-aws-x86_64-with-glibc2.31
-----
Session information updated at 2023-09-07 08:12

Bibliography


Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia. book

Time Series Analysis and Forecasting with ADAM Ivan Svetunkov book

Python for Finance: Mastering Data-Driven Finance

Forecasting: theory and practice PDF

How to cite this document?

Modelling time series trend with tree based models by Joaquín Amat Rodrigo and Javier Escobar Ortiz, available under a CC BY-NC-SA 4.0 at https://www.cienciadedatos.net/documentos/py49-modelling-time-series-trend-with-tree-based-models.html DOI


Did you like the article? Your support is important

Website maintenance has high cost, your contribution will help me to continue generating free educational content. Many thanks! 😊


Creative Commons Licence
This work by Joaquín Amat Rodrigo and Javier Escobar Ortiz is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International.