More about forecasting in cienciadedatos.net

Introduction

Tree-based models, including decision trees, random forests and gradient boosting machines (GBMs), are known for their effectiveness and widespread use in various machine learning applications. However, they have limitations when it comes to extrapolation, i.e., making predictions or estimates beyond the range of observed data. This limitation becomes particularly critical when forecasting time-series data with a trend. Because these models lack the ability to predict values beyond the observed range during training, their predicted values will deviate from the underlying trend.

Several strategies have been proposed to address this challenge, with one of the most frequently used techniques being differentiation. This process involves calculating the differences between successive observations in the time series. Rather than modeling the absolute values, the focus shifts to modeling the relative change ratios. After estimating the predictions, the transformation can be reversed to recover the values in their initial scale.

The skforecast library, version 0.10.0 or higher, introduces a novel differentiation parameter within its forecaster classes to indicate that a differentiation process must be applied before training the model. This is achieved by making internal use of a new transformer named skforecast.preprocessing.TimeSeriesDifferentiator. It should be noted that the differentiation process has been fully automated and its effects are reversed during the prediction phase, ensuring that the forecast values are in the same scale as the original time series data.

This document shows how differentiation can be used to model time series with a positive trend using tree-based models (random forest and a gradient boosting xgboost).

Libraries

# Data manipulation
# ==============================================================================
import numpy as np
import pandas as pd

# Plots
# ==============================================================================
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-darkgrid')

# Modelling and Forecasting
# ==============================================================================
import skforecast
import sklearn
import xgboost
from xgboost import XGBRegressor
from lightgbm import LGBMRegressor
from sklearn.ensemble import RandomForestRegressor
from skforecast.recursive import ForecasterRecursive
from skforecast.model_selection import TimeSeriesFold
from skforecast.model_selection import backtesting_forecaster
from skforecast.preprocessing import TimeSeriesDifferentiator
from sklearn.metrics import mean_absolute_error

# Warnings configuration
# ==============================================================================
import warnings

color = '\033[1m\033[38;5;208m' 
print(f"{color}Version skforecast: {skforecast.__version__}")
print(f"{color}Version scikit-learn: {sklearn.__version__}")
print(f"{color}Version xgboost: {xgboost.__version__}")

Version skforecast: 0.16.0
Version scikit-learn: 1.6.1
Version xgboost: 3.0.0

Data

The dataset consists of monthly totals of international air passengers from 1949 to 1960.

# Download data
# ==============================================================================
url = (
    'https://raw.githubusercontent.com/JoaquinAmatRodrigo/Estadistica-machine-learning-python/'
    'master/data/AirPassengers.csv'
)
data = pd.read_csv(url, sep=',')

# Data preprocessing
# ==============================================================================
data['Date'] = pd.to_datetime(data['Date'], format='%Y-%m')
data = data.set_index('Date')
data = data.asfreq('MS')
data = data['Passengers']
data = data.sort_index()
data.head(4)

Date
1949-01-01    112
1949-02-01    118
1949-03-01    132
1949-04-01    129
Freq: MS, Name: Passengers, dtype: int64

The same data is stored but applying a differentiation of order 1 using the TimeSeriesDifferentiator.

# Data differentiated
# ==============================================================================
diferenciator = TimeSeriesDifferentiator(order=1)
data_diff = diferenciator.fit_transform(data.to_numpy())
data_diff = pd.Series(data_diff, index=data.index).dropna()
data_diff.head(4)

Date
1949-02-01     6.0
1949-03-01    14.0
1949-04-01    -3.0
1949-05-01    -8.0
Freq: MS, dtype: float64

# Data partition train-test
# ==============================================================================
end_train = '1955-12-01 23:59:59'
print(
    f"Train dates : {data.index.min()} --- {data.loc[:end_train].index.max()}  " 
    f"(n={len(data.loc[:end_train])})")
print(
    f"Test dates  : {data.loc[end_train:].index.min()} --- {data.index.max()}  "
    f"(n={len(data.loc[end_train:])})")

# Plot
# ==============================================================================
fig, axs = plt.subplots(1, 2, figsize=(11, 2.5))
axs = axs.ravel()
data.loc[:end_train].plot(ax=axs[0], label='train')
data.loc[end_train:].plot(ax=axs[0], label='test')
axs[0].legend()
axs[0].set_title('Original data')

data_diff.loc[:end_train].plot(ax=axs[1], label='train')
data_diff.loc[end_train:].plot(ax=axs[1], label='test')
axs[1].legend()
axs[1].set_title('Differentiated data');

Train dates : 1949-01-01 00:00:00 --- 1955-12-01 00:00:00  (n=84)
Test dates  : 1956-01-01 00:00:00 --- 1960-12-01 00:00:00  (n=60)

Forecasting with Random Forest and Gradient Boosting

Two autoregressive forecasters are created, one with a scikit-learn RandomForestRegressor and the other with an XGBoost. Both are trained on data from 1949-01-01 to 1955-12-01 and produce forecasts for the next 60 months (5 years).

# Forecasting without differentiation
# ==============================================================================
steps = len(data.loc[end_train:])

# Forecasters
forecaster_rf = ForecasterRecursive(
                    regressor = RandomForestRegressor(random_state=963),
                    lags      = 12
                )
forecaster_gb = ForecasterRecursive(
                    regressor = XGBRegressor(random_state=963),
                    lags      = 12
                )

# Train
forecaster_rf.fit(data.loc[:end_train])
forecaster_gb.fit(data.loc[:end_train])

# Predict
predictions_rf = forecaster_rf.predict(steps=steps)
predictions_gb = forecaster_gb.predict(steps=steps)

# Error
error_rf = mean_absolute_error(data.loc[end_train:], predictions_rf)
error_gb = mean_absolute_error(data.loc[end_train:], predictions_gb)
print(f"Error (MAE) Random Forest: {error_rf:.2f}")
print(f"Error (MAE) Gradient Boosting: {error_gb:.2f}")

# Plot
fig, ax = plt.subplots(figsize=(7, 3), sharex=True, sharey=True)
data.loc[:end_train].plot(ax=ax, label='train')
data.loc[end_train:].plot(ax=ax, label='test')
predictions_rf.plot(ax=ax, label='Random Forest')
predictions_gb.plot(ax=ax, label='Gradient Boosting')
ax.set_title(f'Forecasting without differentiation')
ax.set_xlabel('')
ax.legend();

Error (MAE) Random Forest: 66.10
Error (MAE) Gradient Boosting: 55.38

The plot shows that none of the models is capable of accurately predicting the trend. After a few steps, the predictions become nearly constant, close to the maximum values observed in the training data.

Next, two new forecasters are trained using the same configuration, but with the argument differentiation = 1. This activates the internal process of differencing (order 1) the time series before training the model, and reverses the differentiation (also known as integration) for the predicted values.

# Forecasting with differentiation
# ==============================================================================
steps = len(data.loc[end_train:])

# Forecasters
forecaster_rf = ForecasterRecursive(
                    regressor       = RandomForestRegressor(random_state=963),
                    lags            = 12,
                    differentiation = 1
                )
forecaster_gb = ForecasterRecursive(
                    regressor       = XGBRegressor(random_state=963),
                    lags            = 12,
                    differentiation = 1
                )

# Train
forecaster_rf.fit(data.loc[:end_train])
forecaster_gb.fit(data.loc[:end_train])

# Predict
predictions_rf = forecaster_rf.predict(steps=steps)
predictions_gb = forecaster_gb.predict(steps=steps)

# Error
error_rf = mean_absolute_error(data.loc[end_train:], predictions_rf)
error_gb = mean_absolute_error(data.loc[end_train:], predictions_gb)
print(f"Error (MAE) Random Forest: {error_rf:.2f}")
print(f"Error (MAE) Gradient Boosting: {error_gb:.2f}")

# Plot
fig, ax = plt.subplots(figsize=(7, 3), sharex=True, sharey=True)
data.loc[:end_train].plot(ax=ax, label='train')
data.loc[end_train:].plot(ax=ax, label='test')
predictions_rf.plot(ax=ax, label='Random Forest')
predictions_gb.plot(ax=ax, label='Gradient Boosting')
ax.set_title(f'Forecasting with differentiation')
ax.set_xlabel('')
ax.legend();

Error (MAE) Random Forest: 53.76
Error (MAE) Gradient Boosting: 29.77

This time, both models are able to follow the trend in their predictions.

Deep dive in differencing time series

The previous example showed how easy it is to introduce differentiation into the forecasting process thanks to the functionalities available in skforecast. However, several non-trivial transformations have to be applied in order to achieve a smooth interaction.

In the next sections, the capabilities of the transformer TimeSeriesDifferentiator are introduced:

Differentiation and integration (reverse differentiation) of any given time series.
Why managing the differentiation internally has advantages over the traditional approach of pre-transforming the entire time series before initiating the model training.
How to manage the differentiation when applying the Forecaster to new data that does not immediately follow the training data.

TimeSeriesDifferentiator

TimeSeriesDifferentiator is a custom transformer that follows the preprocessing sklearn API. This means it has the method fit, transform, fit_transform and inverse_transform.

# Differentiation with TimeSeriesDifferentiator
# ==============================================================================
y = np.array([5, 8, 12, 10, 14, 17, 21, 19], dtype=float)
diffenciator = TimeSeriesDifferentiator()
diffenciator.fit(y)
y_diff = diffenciator.transform(y)

print(f"Original time series   : {y}")
print(f"Differenced time series: {y_diff}")

Original time series   : [ 5.  8. 12. 10. 14. 17. 21. 19.]
Differenced time series: [nan  3.  4. -2.  4.  3.  4. -2.]

The process of differencing can be reversed (integration) using the inverse_transform method.

# Inverse transform
# ==============================================================================
diffenciator.inverse_transform(y_diff)

array([ 5.,  8., 12., 10., 14., 17., 21., 19.])

⚠ Warning

The inverse transformation process, inverse_transform, is applicable only to the same time series that was previously differentiated using the same TimeSeriesDifferentiator object. This limitation arises from the need to use the initial n values of the time series (n equals the order of differentiation) to successfully reverse the differentiation. These values are stored when the fit method is executed.

✎ Note

An additional method inverse_transform_next_window is available in the TimeSeriesDifferentiator. This method is designed to be used inside the Forecasters to reverse the differentiation of the predicted values. If the Forecaster regressor is trained with a differentiated time series, then the predicted values will be differentiated as well. The inverse_transform_next_window method allows to return the predictions to the original scale, with the assumption that they start immediately after the last values observed (last_window).

Internal differentiation vs pre-processing

Forecasters manage the differentiation process internally, so there is no need for additional pre-processing of the time series and post-processing of the predictions. This has several advantages, but before diving in, the results of both approaches are compared.

# Time series differentiated by preprocessing before training
# ==============================================================================
diferenciator = TimeSeriesDifferentiator(order=1)
data_diff = diferenciator.fit_transform(data.to_numpy())
data_diff = pd.Series(data_diff, index=data.index).dropna()

forecaster = ForecasterRecursive(
                 regressor = RandomForestRegressor(random_state=963),
                 lags      = 15
             )
forecaster.fit(y=data_diff.loc[:end_train])
predictions_diff = forecaster.predict(steps=steps)

# Revert differentiation to obtain final predictions
last_value_train = data.loc[:end_train].iloc[[-1]]
predictions_1 = pd.concat([last_value_train, predictions_diff]).cumsum()[1:]
predictions_1 = predictions_1.asfreq('MS')
predictions_1.name = 'pred'
predictions_1.head(5)

1956-01-01    303.18
1956-02-01    293.70
1956-03-01    322.68
1956-04-01    326.52
1956-05-01    326.79
Freq: MS, Name: pred, dtype: float64

# Time series differentiated internally by the forecaster
# ==============================================================================
forecaster = ForecasterRecursive(
                 regressor       = RandomForestRegressor(random_state=963),
                 lags            = 15,
                 differentiation = 1
             )
forecaster.fit(y=data.loc[:end_train])
predictions_2 = forecaster.predict(steps=steps)
predictions_2.head(5)

1956-01-01    303.18
1956-02-01    293.70
1956-03-01    322.68
1956-04-01    326.52
1956-05-01    326.79
Freq: MS, Name: pred, dtype: float64

# Compare both predictions
# ==============================================================================
pd.testing.assert_series_equal(predictions_1, predictions_2)

Next, the outcomes of the backtesting process are subjected to a comparative analysis. This comparison is more complex than the previous one, as the process of undoing the differentiation must be performed separately for each backtesting fold.

# Backtesting with the time series differentiated by preprocessing before training
# ==============================================================================
forecaster_1 = ForecasterRecursive(
                   regressor = RandomForestRegressor(random_state=963),
                   lags      = 15
               )
cv = TimeSeriesFold(
        steps              = 5,
        initial_train_size = len(data_diff.loc[:end_train]),
        refit              = True,
        fixed_train_size   = False,
     )
_, predictions_1 = backtesting_forecaster(
                        forecaster    = forecaster_1,
                        y             = data_diff,
                        cv            = cv,
                        metric        = 'mean_squared_error',
                    )

# Revert differentiation of predictions. Predictions of each fold must be reverted
# individually. An id is added to each prediction to identify the fold to which it belongs.
predictions_1 = predictions_1.rename(columns={'pred': 'pred_diff'})
folds = len(predictions_1) / cv.steps
folds = int(np.ceil(folds))
predictions_1['backtesting_fold_id'] = np.repeat(range(folds), cv.steps)[:len(predictions_1)]

# Add the previously observed value of the time series (only to the first prediction of each fold)
previous_overved_values = data.shift(1).loc[predictions_1.index].iloc[::cv.steps]
previous_overved_values.name = 'previous_overved_value'
predictions_1 = predictions_1.merge(
                    previous_overved_values,
                    left_index  = True,
                    right_index = True,
                    how         = 'left'
                )
predictions_1 = predictions_1.fillna(0)
predictions_1['summed_value'] = (
    predictions_1['pred_diff'] + predictions_1['previous_overved_value']
)

# Revert differentiation using the cumulative sum by fold
predictions_1['pred'] = (
    predictions_1
    .groupby('backtesting_fold_id')
    .apply(lambda x: x['summed_value'].cumsum(), include_groups=False)
    .to_numpy()
)

predictions_1.head(5)

  0%|          | 0/12 [00:00<?, ?it/s]

	pred_diff	previous_overved_value	summed_value	pred
1956-01-01	25.18	278.0	303.18	303.18
1956-02-01	-9.48	0.0	-9.48	293.70
1956-03-01	28.98	0.0	28.98	322.68
1956-04-01	3.84	0.0	3.84	326.52
1956-05-01	0.27	0.0	0.27	326.79

# Backtesting with the time series differentiated internally
# ==============================================================================
forecaster_2 = ForecasterRecursive(
                   regressor       = RandomForestRegressor(random_state=963),
                   lags            = 15,
                   differentiation = 1
               )
cv = TimeSeriesFold(
        steps              = 5,
        initial_train_size = len(data.loc[:end_train]),
        refit              = True,
        fixed_train_size   = False,
        differentiation    = 1
     )

_, predictions_2 = backtesting_forecaster(
                        forecaster    = forecaster_2,
                        y             = data,
                        cv            = cv,
                        metric        = 'mean_squared_error'
                    )

predictions_2.head(5)

  0%|          | 0/12 [00:00<?, ?it/s]

	pred
1956-01-01	303.18
1956-02-01	293.70
1956-03-01	322.68
1956-04-01	326.52
1956-05-01	326.79

# Compare both predictions
# ==============================================================================
pd.testing.assert_series_equal(predictions_1['pred'], predictions_2['pred'])

If, as demonstrated, the values are equivalent when differentiating the time series in a preprocessing step or when allowing the Forecaster to manage the differentiation internally, why the second alternative is better?

Allowing the forecaster to manage all transformations internally guarantees that the same transformations are applied when the model is run on new data.
When the model is applied to new data that does not follow immediately after the training data (for example, if a model is not retrained for each prediction phase), the forecaster automatically increases the size of the last window needed to generate the predictors, as well as applying the differentiation to the incoming data and undoing it in the final predictions.

These transformations are non-trivial and very error-prone, so skforecast tries to avoid overcomplicating the already challenging task of forecasting time series.

Linear trees

Linear trees are a type of decision tree where the leaf nodes utilize linear models rather than simple constant values to make predictions. This strategy allows to extrapolate the trend beyond the observed range, as the linear models can continue the trend in the same direction. This is an alternative to the differentiation process.

# Forecasting with linear trees
# ==============================================================================
steps = len(data.loc[end_train:])

# Forecasters
regressor = LGBMRegressor(n_estimators=25, linear_tree=True, random_state=963, verbose=-1)
forecaster_lgbm_linear = ForecasterRecursive(
                            regressor = regressor,
                            lags      = 12
                         )
# Train
forecaster_lgbm_linear.fit(data.loc[:end_train])

# Predict
predictions_lgbm_linear = forecaster_lgbm_linear.predict(steps=steps)

# Error
error_lgbm_linear = mean_absolute_error(data.loc[end_train:], predictions_gb)
print(f"Error (MAE) Gradient Boosting with linear trees: {error_lgbm_linear:.2f}")

# Plot
fig, ax = plt.subplots(figsize=(7, 3), sharex=True, sharey=True)
data.loc[:end_train].plot(ax=ax, label='train')
data.loc[end_train:].plot(ax=ax, label='test')
predictions_lgbm_linear.plot(ax=ax, label='Gradient Boosting with linear trees')
ax.set_title('Forecasting with linear trees')
ax.set_xlabel('')
ax.legend();

Error (MAE) Gradient Boosting with linear trees: 29.77

Session information

import session_info
session_info.show(html=False)

-----
lightgbm            4.6.0
matplotlib          3.10.1
numpy               2.2.5
pandas              2.2.3
session_info        v1.0.1
skforecast          0.16.0
sklearn             1.6.1
xgboost             3.0.0
-----
IPython             9.1.0
jupyter_client      8.6.3
jupyter_core        5.7.2
notebook            6.5.7
-----
Python 3.12.9 | packaged by Anaconda, Inc. | (main, Feb  6 2025, 18:56:27) [GCC 11.2.0]
Linux-6.11.0-25-generic-x86_64-with-glibc2.39
-----
Session information updated at 2025-05-13 22:32

Citation

How to cite this document

If you use this document or any part of it, please acknowledge the source, thank you!

Modelling time series trend with tree based models by Joaquín Amat Rodrigo and Javier Escobar Ortiz, available under Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0 DEED) at https://www.cienciadedatos.net/documentos/py49-modelling-time-series-trend-with-tree-based-models.html

How to cite skforecast

If you use skforecast for a publication, we would appreciate it if you cite the published software.

Zenodo:

Amat Rodrigo, Joaquin, & Escobar Ortiz, Javier. (2024). skforecast (v0.16.0). Zenodo. https://doi.org/10.5281/zenodo.8382788

APA:

Amat Rodrigo, J., & Escobar Ortiz, J. (2024). skforecast (Version 0.16.0) [Computer software]. https://doi.org/10.5281/zenodo.8382788

BibTeX:

@software{skforecast, author = {Amat Rodrigo, Joaquin and Escobar Ortiz, Javier}, title = {skforecast}, version = {0.16.0}, month = {05}, year = {2025}, license = {BSD-3-Clause}, url = {https://skforecast.org/}, doi = {10.5281/zenodo.8382788} }

Did you like the article? Your support is important

Your contribution will help me to continue generating free educational content. Many thanks! 😊

This work by Joaquín Amat Rodrigo and Javier Escobar Ortiz is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International.

Allowed:

Share: copy and redistribute the material in any medium or format.
Adapt: remix, transform, and build upon the material.

Under the following terms:

Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
NonCommercial: You may not use the material for commercial purposes.
ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

Modelling time series trend with tree based models

Joaquín Amat Rodrigo, Javier Escobar Ortiz

September, 2023 (last update May 2025)