More about forecasting in cienciadedatos.net

Introduction¶

ARIMA (AutoRegressive Integrated Moving Average) and SARIMAX (Seasonal AutoRegressive Integrated Moving Average with eXogenous regressors) are widely recognized and extensively utilized statistical forecasting models. These models comprise three components. The autoregressive element (AR) relates the current value to past (lagged) values. The moving average element (MA) assumes that the regression error is a linear combination of past forecast errors. Finally, the integrated component (I) indicates that the data values have been replaced with the difference between their values and the previous ones (and this differencing process may have been performed more than once).

While ARIMA models are well-known, SARIMAX models expand on the ARIMA framework by seamlessly incorporating seasonal patterns and exogenous variables.

In the ARIMA-SARIMAX model notation, the parameters $p$, $d$, and $q$ represent the autoregressive, differencing, and moving-average components, respectively. $P$, $D$, and $Q$ denote the same components for the seasonal part of the model, with $m$ representing the number of periods in each season.

$p$ is the order (number of time lags) of the autoregressive part of the model.
$d$ is the degree of differencing (the number of times that past values have been subtracted from the data).
$q$ is the order of the moving average part of the model.
$P$ is the order (number of time lags) of the seasonal part of the model.
$D$ is the degree of differencing (the number of times the data have had past values subtracted) of the seasonal part of the model.
$Q$ is the order of the moving average of the seasonal part of the model.
$m$ refers to the number of periods in each season.

When the terms $P$, $D$, $Q$, and $m$ are zero and no exogenous variables are included in the model, the SARIMAX model is equivalent to an ARIMA.

Python library Skforecast provides two classes that allow the creation of ARIMA family models:

skforecast Sarimax: A wrapper for the well-known statsmodels SARIMAX that strictly follows the scikit-learn API. Similar to pmdarima, this version has been streamlined to include only the essential elements required by skforecast, resulting in significant performance improvements.
skforecast Arima: A native implementation that also follows the scikit-learn API. This version is optimized for speed using Numba JIT compilation. It supports seasonal components, exogenous features, and automated parameter searching, making it a versatile tool for the entire ARIMA family: ARIMA, SARIMAX, and AutoARIMA.

This guide delves into these libraries and explains how to build ARIMA-SARIMAX models using each of the implementations. In addition, it shows how the ForecasterStats class extends the capabilities of ARIMA-SARIMAX models by incorporating functionalities such as backtesting, hyperparameter tuning, probabilistic forecasting, and more. This integration allows users to leverage the strengths of ARIMA-SARIMAX models while benefiting from the advanced features provided by the skforecast library.

✏️ Note

This document is part of the series of articles about time series forecasting with statistical models:

For a more detailed explanation of ARIMA-SARIMAX models visit:

⚠️ Warning

Since version 0.19.0 of skforecast, the class Sarimax has been moved from skforecast.sarimax to skforecast.stats. Please update your imports to use from skforecast.stats import Sarimax.

Since version 0.19.0 of skforecast, the class ForecasterSarimax has been deprecated in favor of the new class ForecasterStats, which provides improved performance and broader compatibility with statistical models. Users are encouraged to transition to ForecasterStats for future projects.

Libraries¶

Libraries used in this document.

# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from io import StringIO
import contextlib
import re
import matplotlib.pyplot as plt

# statsmodels
import statsmodels
from statsmodels.tsa.statespace.sarimax import SARIMAX
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.stattools import kpss
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.seasonal import seasonal_decompose

# skforecast
import skforecast
from skforecast.datasets import fetch_dataset
from skforecast.plot import set_dark_theme, plot_prediction_intervals
from skforecast.stats import Sarimax, Arima
from skforecast.recursive import ForecasterStats
from skforecast.utils import expand_index
from skforecast.model_selection import TimeSeriesFold, backtesting_stats, grid_search_stats

import warnings
warnings.filterwarnings('once')
warnings.filterwarnings('ignore', category=DeprecationWarning)

color = '\033[1m\033[38;5;208m' 
print(f"{color}Version skforecast: {skforecast.__version__}")
print(f"{color}Version statsmodels: {statsmodels.__version__}")
print(f"{color}Version pandas: {pd.__version__}")
print(f"{color}Version numpy: {np.__version__}")

Version skforecast: 0.20.0
Version statsmodels: 0.14.6
Version pandas: 2.3.3
Version numpy: 2.2.6

Data¶

The dataset in this document is a summary of monthly fuel consumption in Spain.

# Data download
# ==============================================================================
data = fetch_dataset(name='fuel_consumption', raw=True)
data = data[['Fecha', 'Gasolinas']]
data = data.rename(columns={'Fecha':'date', 'Gasolinas':'litters'})
data['date'] = pd.to_datetime(data['date'], format='%Y-%m-%d')
data = data.set_index('date')
data = data.loc[:'1990-01-01 00:00:00']
data = data.asfreq('MS')
data = data['litters']
display(data.head(4))

╭──────────────────────────────── fuel_consumption ────────────────────────────────╮
│ Description:                                                                     │
│ Monthly fuel consumption in Spain from 1969-01-01 to 2022-08-01.                 │
│                                                                                  │
│ Source:                                                                          │
│ Obtained from Corporación de Reservas Estratégicas de Productos Petrolíferos and │
│ Corporación de Derecho Público tutelada por el Ministerio para la Transición     │
│ Ecológica y el Reto Demográfico. https://www.cores.es/es/estadisticas            │
│                                                                                  │
│ URL:                                                                             │
│ https://raw.githubusercontent.com/skforecast/skforecast-                         │
│ datasets/main/data/consumos-combustibles-mensual.csv                             │
│                                                                                  │
│ Shape: 644 rows x 6 columns                                                      │
╰──────────────────────────────────────────────────────────────────────────────────╯

date
1969-01-01    166875.2129
1969-02-01    155466.8105
1969-03-01    184983.6699
1969-04-01    202319.8164
Freq: MS, Name: litters, dtype: float64

# Train-test dates
# ==============================================================================
end_train = '1983-01-01 23:59:59'
print(
    f"Train dates : {data.index.min()} --- {data.loc[:end_train].index.max()}  "
    f"(n={len(data.loc[:end_train])})"
)
print(
    f"Test dates  : {data.loc[end_train:].index.min()} --- {data.loc[:].index.max()}  "
    f"(n={len(data.loc[end_train:])})"
)
data_train = data.loc[:end_train]
data_test  = data.loc[end_train:]

# Plot
# ==============================================================================
set_dark_theme()
fig, ax=plt.subplots(figsize=(7, 3))
data_train.plot(ax=ax, label='train')
data_test.plot(ax=ax, label='test')
ax.set_title('Monthly fuel consumption in Spain')
ax.legend();

Train dates : 1969-01-01 00:00:00 --- 1983-01-01 00:00:00  (n=169)
Test dates  : 1983-02-01 00:00:00 --- 1990-01-01 00:00:00  (n=84)

Exploratory Analysis¶

Embarking on the journey to create an ARIMA model requires a comprehensive exploratory analysis. This critical step serves as a compass, guiding the analyst toward a deep understanding of the intrinsic dynamics of the data. Before fitting an ARIMA model to time series data, it is important to conduct an exploratory analysis to determine, at least, the following:

Stationarity: Stationarity means that the statistical properties (mean, variance...) remain constant over time, so time series with trends or seasonality are not stationary. Since ARIMA assumes the stationarity of the data, it is essential to subject the data to rigorous tests, such as the Augmented Dickey-Fuller test, to assess stationarity. If non-stationarity is found, the series should be differenced until stationarity is achieved. This analysis helps to determine the optimal value of the parameter $d$.
Autocorrelation analysis: Plot the autocorrelation and partial autocorrelation functions (ACF and PACF) to identify potential lag relationships between data points. This visual analysis provides insight into determining appropriate autoregressive (AR) and moving average (MA) terms ($p$ and $q$) for the ARIMA model.
Seasonal decomposition: In cases where seasonality is suspected, decomposing the series into trend, seasonal, and residual components using techniques such as moving averages or seasonal time series decomposition (STL) can reveal hidden patterns and help identify seasonality. This analysis helps to determine the optimal values of the parameters $P$, $D$, $Q$ and $m$.

These exploratory analyses establish the foundation for constructing an effective ARIMA model that captures the fundamental patterns and associations within the data.

Stationarity¶

There are several methods to assess whether a given time series is stationary or non-stationary:

Visual inspection of the time series: By visually inspecting the time series plot, it becomes possible to identify the presence of a noticeable trend or seasonality. If such patterns are apparent, it is probable that the series is non-stationary.
Summary statistics: Calculate summary statistics, such as the mean and variance, for various segments of the series. If significant differences exist, the series is not stationary.
Statistical tests: Apply statistical tests such as the Augmented Dickey-Fuller test or the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test.

The previous plot shows a clear positive trend in the series, indicating a steady increase over time. Consequently, the mean of the series increases over time, confirming its non-stationarity.

Differencing is one of the easiest techniques to detrend a time series. A new series is generated where the value at the current time step is calculated as the difference between the original observation and the observation at the previous time step, i.e. the difference between consecutive values. Mathematically, the first difference is calculated as:

$$\Delta X_t = X_t - X_{t-1}$$

Where $X_t$ is the value at time $t$ and $X_{t-1}$ is the value at time $t-1$. This is known as first order differentiation. This process can be repeated if necessary until the desired stationarity is reached.

In the following sections, the original time series is subjected to both first and second-order differencing and statistical tests are applied to determine whether stationarity is achieved.

Augmented Dickey-Fuller test

The Augmented Dickey-Fuller test takes as its null hypothesis that the time series has a unit root - a characteristic of non-stationary time series. Conversely, the alternative hypothesis (under which the null hypothesis is rejected) is that the series is stationary.

Null Hypothesis (HO): The series is not stationary or has a unit root.
Alternative hypothesis (HA): The series is stationary with no unit root.

Since the null hypothesis assumes the presence of a unit root, the p-value obtained should be less than a specified significance level, often set at 0.05, to reject this hypothesis. This result indicates the stationarity of the series. The adfuller() function within the Statsmodels library is a handy tool for implementing the ADF test. Its output includes four values: the p-value, the value of the test statistic, the number of lags included in the test, and critical value thresholds for three different levels of significance.

Kwiatkowski-Phillips-Schmidt-Shin test (KPSS)

The KPSS test checks if a time series is stationary around a mean or linear trend. In this test, the null hypothesis is that the data are stationary, and we look for evidence that the null hypothesis is false. Consequently, small p-values (e.g., less than 0.05) rejects the null hypothesis and suggest that differencing is required. Statsmodels library provides an implementation of the KPSS test via the kpss() function.

✏️ Note

While both tests are used to check stationarity,

The KPSS test focuses on the presence of trends, and a low p-value indicates non-stationarity due to a trend.
The ADF test focuses on the presence of a unit root, and a low p-value indicates that the time series does not have a unit root, suggesting it might be stationary.

It's common to use both tests together to get a more comprehensive understanding of the stationarity properties of a time series.

# Test stationarity
# ==============================================================================
data_diff_1 = data_train.diff().dropna()
data_diff_2 = data_diff_1.diff().dropna()

# Suppress warnings for stationarity tests
with warnings.catch_warnings():
    warnings.filterwarnings('ignore')
    print('Test stationarity for original series')
    print('-------------------------------------')
    adfuller_result = adfuller(data)
    kpss_result = kpss(data)
    print(f'ADF Statistic: {adfuller_result[0]}, p-value: {adfuller_result[1]}')
    print(f'KPSS Statistic: {kpss_result[0]}, p-value: {kpss_result[1]}')

    print('\nTest stationarity for differenced series (order=1)')
    print('--------------------------------------------------')
    adfuller_result = adfuller(data_diff_1)
    kpss_result = kpss(data.diff().dropna())
    print(f'ADF Statistic: {adfuller_result[0]}, p-value: {adfuller_result[1]}')
    print(f'KPSS Statistic: {kpss_result[0]}, p-value: {kpss_result[1]}')

    print('\nTest stationarity for differenced series (order=2)')
    print('--------------------------------------------------')
    adfuller_result = adfuller(data_diff_2)
    kpss_result = kpss(data.diff().diff().dropna())
    print(f'ADF Statistic: {adfuller_result[0]}, p-value: {adfuller_result[1]}')
    print(f'KPSS Statistic: {kpss_result[0]}, p-value: {kpss_result[1]}')

# Plot series
# ==============================================================================
fig, axs = plt.subplots(nrows=3, ncols=1, figsize=(7, 5), sharex=True)
data.plot(ax=axs[0], title='Original time series')
data_diff_1.plot(ax=axs[1], title='Differenced order 1')
data_diff_2.plot(ax=axs[2], title='Differenced order 2');

Test stationarity for original series
-------------------------------------
ADF Statistic: -0.44612980998227997, p-value: 0.9021071923942665
KPSS Statistic: 2.2096370946978383, p-value: 0.01

Test stationarity for differenced series (order=1)
--------------------------------------------------
ADF Statistic: -2.7436320757907513, p-value: 0.06678751673391206
KPSS Statistic: 0.313271162357279, p-value: 0.1

Test stationarity for differenced series (order=2)
--------------------------------------------------
ADF Statistic: -10.490170523414776, p-value: 1.1524989419523892e-18
KPSS Statistic: 0.08065668267482215, p-value: 0.1

After checking the first and second-order differences, the p-value indicates a statistically significant decrease below the widely-recognized and accepted threshold of 0.05 for order=1. Therefore, the most appropriate selection for the ARIMA parameter $d$ is 1.

Autocorrelation Analysis¶

Plotting the autocorrelation function (ACF) and partial autocorrelation function (PACF) of the time series can provide insight into the appropriate values for $p$ and $q$. The ACF helps in identifying the value of $q$ (lag in the moving average part), while the PACF assists in identifying the value of $p$ (lag in the autoregressive part).

⚠️ Warning

If the stationarity analysis indicates that differencing is required, subsequent analyses should be conducted using the differenced series, as this will align with the manner in which the ARIMA model interprets the series.

Autocorrelation Function (ACF)

The ACF calculates the correlation between a time series and its lagged values. Within the context of ARIMA modeling, a sharp drop-off in the ACF after a few lags indicates that the data have a finite autoregressive order. The lag at which the ACF drops off provides an estimation of the value of $q$. If the ACF displays a sinusoidal or damped sinusoidal pattern, it suggests seasonality is present and requires consideration of seasonal orders in addition to non-seasonal orders.

Partial Autocorrelation Function (PACF)

The PACF measures the correlation between a lagged value and the current value of the time series, while accounting for the effect of the intermediate lags. In the context of ARIMA modeling, if the PACF sharply cuts off after a certain lag, while the remaining values are within the confidence interval, it suggests an AR model of that order. The lag, at which the PACF cuts off, gives an idea of the value of $p$.

✏️ Note

Some rules of thumb are:

Take the order of AR term p to be equal to as many lags that crosses the significance limit in the PACF plot.
Take the order of MA term q to be equal to as many lags that crosses the significance limit in the ACF plot.
If the ACF cuts off at lag q and the PACF cuts off at lag p, one could start with an ARIMA(p, d, q) model.
If only the PACF stops after lag p, one could start with an AR(p) model.
If only the ACF stops after lag q, one could start with an MA(q) model.

These guidelines provide a useful starting point when selecting the orders of an ARIMA model and can be adjusted according to the specific characteristics of the data in question. However, these are only rules of thumb, if possible, it is recommended to use hyperparameter tuning process.

# Autocorrelation plot for original and differentiated series
# ==============================================================================
fig, axs = plt.subplots(nrows=2, ncols=1, figsize=(6, 4), sharex=True)
plot_acf(data, ax=axs[0], lags=50, alpha=0.05)
axs[0].set_title('Autocorrelation original series')
plot_acf(data_diff_1, ax=axs[1], lags=50, alpha=0.05)
axs[1].set_title('Autocorrelation differentiated series (order=1)');

# Partial autocorrelation plot for original and differenced series
# ==============================================================================
fig, axs = plt.subplots(nrows=2, ncols=1, figsize=(6, 3), sharex=True)
plot_pacf(data, ax=axs[0], lags=50, alpha=0.05)
axs[0].set_title('Partial autocorrelation original series')
plot_pacf(data_diff_1, ax=axs[1], lags=50, alpha=0.05)
axs[1].set_title('Partial autocorrelation differenced series (order=1)');
plt.tight_layout();

Based on the partial autocorrelation function, the optimal value for parameter $p$ is 0. However, we will assign a value of 1 to provide an autoregressive component to the model. Regarding the $q$ component, the autocorrelation function suggests a value of 1.

Time series decomposition¶

Time series decomposition involves breaking down the original time series into its underlying components, namely trend, seasonality, and residual (error) components. The decomposition can be either additive or multiplicative. Including time series decomposition along with ACF and PACF analysis provides a comprehensive approach to understanding the underlying structure of the data and choose appropriate ARIMA parameters.

# Time series decomposition of original versus differenced series
# ==============================================================================
res_decompose = seasonal_decompose(data, model='additive', extrapolate_trend='freq')
res_descompose_diff_2 = seasonal_decompose(data_diff_1, model='additive', extrapolate_trend='freq')

fig, axs = plt.subplots(nrows=4, ncols=2, figsize=(9, 6), sharex=True)

res_decompose.observed.plot(ax=axs[0, 0])
axs[0, 0].set_title('Original series', fontsize=12)
res_decompose.trend.plot(ax=axs[1, 0])
axs[1, 0].set_title('Trend', fontsize=12)
res_decompose.seasonal.plot(ax=axs[2, 0])
axs[2, 0].set_title('Seasonal', fontsize=12)
res_decompose.resid.plot(ax=axs[3, 0])
axs[3, 0].set_title('Residuals', fontsize=12)
res_descompose_diff_2.observed.plot(ax=axs[0, 1])
axs[0, 1].set_title('Differenced series (order=1)', fontsize=12)
res_descompose_diff_2.trend.plot(ax=axs[1, 1])
axs[1, 1].set_title('Trend', fontsize=12)
res_descompose_diff_2.seasonal.plot(ax=axs[2, 1])
axs[2, 1].set_title('Seasonal', fontsize=12)
res_descompose_diff_2.resid.plot(ax=axs[3, 1])
axs[3, 1].set_title('Residuals', fontsize=12)
fig.suptitle('Time series decomposition original series versus differenced series', fontsize=14)
fig.tight_layout();

The recurring pattern every 12 months suggests an annual seasonality, likely influenced by holiday factors. The ACF plot further supports the presence of seasonality, as significant peaks occur at lags corresponding to the 12-month intervals, confirming the idea of recurring patterns.

Conclusions¶

Based on the results of the exploratory analysis, utilizing a combination of first-order differencing and seasonal differencing may be the most appropriate approach. First-order differencing is effective in capturing transitions between observations and highlighting short-term fluctuations. Concurrently, seasonal differencing, which covers a period of 12 months and represents the shift from one year to the next, effectively captures the inherent cyclic patterns in the data. This approach allows us to achieve the necessary stationarity for the following ARIMA modeling process.

# First-order differentiation combined with seasonal differentiation
# ==============================================================================
data_diff_1_12 = data_train.diff().diff(12).dropna()

warnings.filterwarnings("ignore")
adfuller_result = adfuller(data_diff_1_12)
print(f'ADF Statistic: {adfuller_result[0]}, p-value: {adfuller_result[1]}')
kpss_result = kpss(data_diff_1_12)
print(f'KPSS Statistic: {kpss_result[0]}, p-value: {kpss_result[1]}')
warnings.filterwarnings("default")

ADF Statistic: -4.86234654148435, p-value: 4.126322028795106e-05
KPSS Statistic: 0.29553731751033135, p-value: 0.1

⚠️ Warning

Exploratory data analysis is an evolving and iterative process in which the insights gained have the potential to change as the process progresses. Remember that all previous plots only provide initial guidance, and the optimal values of $p$, $q$, and $d$ should be selected based on a combination of these plots, statistical criteria such as AIC and BIC, and a time series validation such as backtesting.

ARIMA-SARIMAX model¶

The following section focus on how to train an ARIMA model and forecast future values with each of the three libraries.

Statsmodels¶

In statsmodels, a distinction is made between the process of defining a model and training it. This approach may be familiar to R programming language users, but it may seem somewhat unconventional to those accustomed to libraries like scikit-learn or XGBoost in the Python ecosystem.

The process begins with the model definition, which includes configurable parameters and the training dataset. When the fit method is invoked, instead of modifying the model object, as is typical in Python libraries, statsmodels creates a new SARIMAXResults object. This object not only encapsulates essential details like residuals and learned parameters, but also provides the necessary tools to generate predictions.

# ARIMA model with statsmodels.Sarimax
# ==============================================================================
warnings.filterwarnings("ignore", category=UserWarning, message='Non-invertible|Non-stationary')
model = SARIMAX(endog = data_train, order = (1, 1, 1), seasonal_order = (1, 1, 1, 12))
model_res = model.fit(disp=0)
model_res.summary()

SARIMAX Results
Dep. Variable:	litters	No. Observations:	169
Model:	SARIMAX(1, 1, 1)x(1, 1, 1, 12)	Log Likelihood	-1765.688
Date:	Sun, 01 Feb 2026	AIC	3541.375
Time:	20:02:17	BIC	3556.625
Sample:	01-01-1969	HQIC	3547.569
	- 01-01-1983
Covariance Type:	opg

	coef	std err	z	P>\|z\|	[0.025	0.975]
ar.L1	-0.3524	0.143	-2.470	0.014	-0.632	-0.073
ma.L1	-0.1970	0.145	-1.358	0.174	-0.481	0.087
ar.S.L12	0.0509	0.109	0.465	0.642	-0.164	0.265
ma.S.L12	-0.4676	0.140	-3.333	0.001	-0.743	-0.193
sigma2	3.962e+08	3.32e-10	1.19e+18	0.000	3.96e+08	3.96e+08

Ljung-Box (L1) (Q):	3.98	Jarque-Bera (JB):	14.14
Prob(Q):	0.05	Prob(JB):	0.00
Heteroskedasticity (H):	1.11	Skew:	-0.49
Prob(H) (two-sided):	0.72	Kurtosis:	4.10

Warnings:
[1] Covariance matrix calculated using the outer product of gradients (complex-step).
[2] Covariance matrix is singular or near-singular, with condition number 3.88e+33. Standard errors may be unstable.

The model summary shows a lot of information about the fitting process:

Model Fit Statistics: This part includes several statistics that help you evaluate how well the model fits the data:
- Log-Likelihood: A measure of how well the model explains the observed data. When fitting an ARIMA model, negative log-likelihood values will be encounter, with more negative values indicating a poorer fit to the data, and values closer to zero indicating a better fit.
- AIC (Akaike Information Criterion): A goodness-of-fit metric that balances the fit of the model with its complexity. Lower AIC values are preferred.
- BIC (Bayesian Information Criterion): Similar to AIC, but penalizes model complexity more. As with AIC, lower BIC values are better.
- HQIC (Hannan-Quinn Information Criterion): Another model selection criterion, similar to AIC and BIC.
Coefficients: This table lists the estimated coefficients for the parameters of the model. It includes both autoregressive (AR) and moving average (MA) parameters, as well as any exogenous variables if they are included in the model. It also includes the standard errors associated with the estimated coefficients to indicate the uncertainty in the parameter estimates, their P-values, which are used to assess the significance of each coefficient, and the 95% confidence interval.
Model diagnostics: This section provides information about the residuals (the differences between the observed values (training values) and their predicted values from the model):
- Ljung-Box test: A test for autocorrelation in the residuals.
- Jarque-Bera test: A test of the normality of the residuals.
- Skewness and kurtosis: Measures of the shape of the distribution of the residuals.

# Prediction
# ==============================================================================
predictions_statsmodels = model_res.get_forecast(steps=len(data_test)).predicted_mean
predictions_statsmodels.name = 'predictions_statsmodels'
display(predictions_statsmodels.head(4))

1983-02-01    408707.295485
1983-03-01    465933.670647
1983-04-01    506906.544811
1983-05-01    463579.002354
Freq: MS, Name: predictions_statsmodels, dtype: float64

Skforecast Sarimax¶

Skforecast Sarimax wraps the statsmodels.SARIMAX model and adapts it to the scikit-learn API.

# ARIMA model with skforecast Sarimax
# ==============================================================================
warnings.filterwarnings("ignore", category=UserWarning, message='Non-invertible|Non-stationary')
model = Sarimax(order=(1, 1, 1), seasonal_order=(1, 1, 1, 12))
model.fit(y=data_train)
model.summary()

SARIMAX Results
Dep. Variable:	litters	No. Observations:	169
Model:	SARIMAX(1, 1, 1)x(1, 1, 1, 12)	Log Likelihood	-1765.688
Date:	Sun, 01 Feb 2026	AIC	3541.375
Time:	20:02:18	BIC	3556.625
Sample:	01-01-1969	HQIC	3547.569
	- 01-01-1983
Covariance Type:	opg

	coef	std err	z	P>\|z\|	[0.025	0.975]
ar.L1	-0.3524	0.143	-2.470	0.014	-0.632	-0.073
ma.L1	-0.1970	0.145	-1.358	0.174	-0.481	0.087
ar.S.L12	0.0509	0.109	0.465	0.642	-0.164	0.265
ma.S.L12	-0.4676	0.140	-3.333	0.001	-0.743	-0.193
sigma2	3.962e+08	3.32e-10	1.19e+18	0.000	3.96e+08	3.96e+08

Ljung-Box (L1) (Q):	3.98	Jarque-Bera (JB):	14.14
Prob(Q):	0.05	Prob(JB):	0.00
Heteroskedasticity (H):	1.11	Skew:	-0.49
Prob(H) (two-sided):	0.72	Kurtosis:	4.10

# Prediction
# ==============================================================================
predictions_skforecast_sarimax = model.predict(steps=len(data_test))
predictions_skforecast_sarimax = pd.DataFrame(predictions_skforecast_sarimax, index=data_test.index)
predictions_skforecast_sarimax.columns = ['skforecast']
display(predictions_skforecast_sarimax.head(4))

	skforecast
date
1983-02-01	408707.295485
1983-03-01	465933.670647
1983-04-01	506906.544811
1983-05-01	463579.002354

Skforecast Arima¶

Since version 0.20.0 of skforecast, the native implementation of ARIMA models is available through the Arima class. This implementation is optimized for speed using Numba JIT compilation. It supports seasonal components, exogenous features, and automated parameter searching, making it a versatile tool for the entire ARIMA family: ARIMA, SARIMAX, and AutoARIMA.

💡 Tip

Skforecast's ARIMA implementation is optimized for speed using just-in-time compilation with Numba. This makes that the first fit of the model is slower due to the compilation process, but subsequent fits and predictions are significantly faster compared the statsmodels implementation.

# ARIMA model with skforecast Arima
# ==============================================================================
model = Arima(order=(1, 1, 1), seasonal_order=(1, 1, 1), m=12)
model.fit(y=data_train, suppress_warnings=True)
model.summary()

ARIMA Model Summary
============================================================
Model     : Arima(1,1,1)(1,1,1)[12]
Method    : ARIMA(1,1,1)(1,1,1)[12]
Converged : False

Coefficients:
------------------------------------------------------------
  ar1            :    -0.3083  (SE:   0.1137, t:    -2.71)
  ma1            :    -0.5583  (SE:   0.1093, t:    -5.11)
  sar1           :    -0.1390  (SE:   0.1160, t:    -1.20)
  sma1           :    -0.5581  (SE:   0.1010, t:    -5.53)

Model fit statistics:
  sigma^2:             249194557.351875
  Log-likelihood:      -1733.55
  AIC:                 3477.11
  BIC:                 N/A

Residual statistics:
  Mean:                -1167.159495
  Std Dev:             15166.561117
  MAE:                 10860.580394
  RMSE:                15166.599660

Time Series Summary Statistics:
Number of observations: 169
  Mean:                 384743.1773
  Std Dev:              108126.6689
  Min:                  155466.8105
  25%:                  303667.7591
  Median:               397278.0241
  75%:                  466194.3073
  Max:                  605073.0143

For performance reasons, predictions are returned as NumPy arrays. These can be easily converted into Pandas Series by mapping them to the corresponding time index.

# Prediction
# ==============================================================================
predictions_skforecast_arima = model.predict(steps=len(data_test))
pred_index = expand_index(index=data_train.index, steps=len(data_test))
predictions_skforecast_arima = pd.Series(predictions_skforecast_arima, index=pred_index)
predictions_skforecast_arima.head(4)

1983-02-01    414750.833100
1983-03-01    465097.100031
1983-04-01    504456.783383
1983-05-01    472048.144760
Freq: MS, dtype: float64

Prediction's comparison¶

# Plot predictions
# ==============================================================================
fig, ax = plt.subplots(figsize=(7, 3))
data_train.plot(ax=ax, label='train')
data_test.plot(ax=ax, label='test')
predictions_statsmodels.plot(ax=ax, label='statsmodels')
predictions_skforecast_sarimax.plot(ax=ax, label='skforecast sarimax')
predictions_skforecast_arima.plot(ax=ax, label='skforecast arima')
ax.set_title('Predictions with ARIMA models')
ax.legend();

✏️ Note

Since skforecast Sarimax, is a wrapper of statsmodels SARIMAX, the results are the same. Skforecast Arima implementation has being developed follwing the same mathematical principles as statsmodels, however, due to differences in numerical precision and optimization techniques, slight variations in results may occur. These differences are typically minimal and should not significantly impact the overall performance or conclusions drawn from the model.

ForecasterStats¶

The ForecasterStats class extends the capabilities of ARIMA-SARIMAX models by incorporating functionalities such as backtesting, hyperparameter tuning, probabilistic forecasting, and more. This integration allows users to leverage the strengths of ARIMA-SARIMAX models while benefiting from the advanced features provided by the skforecast library.

Train¶

The train-prediction process follows an API similar to that of scikit-learn. More details in the ForecasterStats user guide.

# ForecasterStats with Arima model
# ==============================================================================
forecaster = ForecasterStats(
                 estimator=Arima(order=(1, 1, 1), seasonal_order=(1, 1, 1), m=12)
             )
forecaster.fit(y=data_train, suppress_warnings=True)
forecaster

ForecasterStats

General Information

Estimators:
- skforecast.Arima: Arima(1,1,1)(1,1,1)[12]
Window size: 1
Series name: litters
Exogenous included: False
Creation date: 2026-02-01 20:02:20
Last fit date: 2026-02-01 20:02:20
Skforecast version: 0.20.0
Python version: 3.13.11
Forecaster id: None

Exogenous Variables

None

Data Transformations

Transformer for y: None
Transformer for exog: None

Training Information

Training range: [Timestamp('1969-01-01 00:00:00'), Timestamp('1983-01-01 00:00:00')]
Training index type: DatetimeIndex
Training index frequency: MS

Estimator Parameters

skforecast.Arima: {'order': (1, 1, 1), 'seasonal_order': (1, 1, 1), 'm': 12, 'include_mean': True, 'transform_pars': True, 'method': 'CSS-ML', 'n_cond': None, 'SSinit': 'Gardner1980', 'optim_method': 'BFGS', 'optim_kwargs': {'maxiter': 1000}, 'kappa': 1000000.0}

Fit Kwargs

None

🛈 API Reference 🗎 User Guide

Prediction¶

Once the model is fitted, it can be used to forecast future observations. It is important to note that these types of models require predictions to follow immediately after the training data; therefore, the forecast starts right after the last observed value.

# Predictions
# ==============================================================================
steps = len(data_test)
predictions = forecaster.predict(steps=steps)
predictions.head(3)

1983-02-01    492215.254120
1983-03-01    570603.230916
1983-04-01    605183.040580
1983-05-01    577888.039998
Freq: MS, Name: pred, dtype: float64

Prediction intervals¶

The method predict_interval enables the calculation of prediction intervals for the forecasted values. Users can specify the confidence level of the estimated interval using either the alpha or interval argument.

# Prediction intervals
# ==============================================================================
predictions = forecaster.predict_interval(steps=steps, alpha=0.05)
predictions.head(3)

	pred	lower_bound	upper_bound
1983-02-01	414750.833100	383536.923137	445964.743062
1983-03-01	465097.100031	431512.347378	498681.852684
1983-04-01	504456.783383	469458.598144	539454.968623

# Plot prediction intervals
# ==============================================================================
fig, ax = plt.subplots(figsize=(6, 3))
plot_prediction_intervals(
    predictions         = predictions,
    y_true              = data_test,
    target_variable     = "litters",
    title               = "Prediction intervals in test data",
    kwargs_fill_between = {'color': 'white', 'alpha': 0.3, 'zorder': 1},
    ax                  = ax
)

Feature importances¶

For the ARIMA model, the method get_feature_importances returns the estimated coefficients of the model (AR, MA, and seasonal components). These coefficients indicate the weight assigned to each lag term in the model's equations.

# Feature importances
# ==============================================================================
model.get_feature_importances()

	feature	importance
0	ar1	-0.308331
1	ma1	-0.558260
2	sar1	-0.138975
3	sma1	-0.558092

Prediction on training data (In-sample Predictions)¶

Predictions on the training data are crucial for evaluating the accuracy and effectiveness of the model. By comparing the predicted values wtih the actual observed values in the training dataset, you can assess how well the model has learned the underlying patterns and trends in the data. This comparison helps in understanding the model's performance and identify areas where it may need improvement or adjustment. In essence, they act as a mirror, reflecting how the model interprets and reconstructs the historical data on which it was trained.

Predictions of the observatios used to fit the mode are stored in the fitted_values_ attribute of the Arima object.

# In-sample Predictions
# ==============================================================================
# Show only the first 5 values 
model.fitted_values_[:5]

array([166778.86748562, 155441.28198886, 184949.38956579, 202282.72293861,
       206228.03030826])

Backtesting¶

The following example shows the application of backtesting in assessing the performance of the SARIMAX model when generating forecasts for the upcoming 12 months on an annual schedule. In this context, a forecast is generated at the end of each December, predicting values for the subsequent 12 months.

✏️ Note

Why do statistical models require refitting during backtesting?

Unlike machine learning models, statistical models like ARIMA maintain an internal state that depends on the sequence of observations. They can only generate predictions starting from the last observed time step — they cannot "jump" to an arbitrary point in the future without knowing all previous values. During backtesting, when the validation window moves forward, the model must be refitted to incorporate the new observations and update its internal state. This is why refit=True is typically required. Performance optimization: Because refitting is mandatory, skforecast's Numba-optimized backend becomes essential. It enables hundreds of refits during backtesting in a fraction of the time required by non-optimized libraries. Exception: The skforecast.stats.Sarimax model implements an efficient state-space representation that allows updating predictions without full model refitting.

# Create forecaster
# ==============================================================================
forecaster = ForecasterStats(
                 estimator=Arima(order=(1, 1, 1), seasonal_order=(1, 1, 1), m=12)
             )

# Backtest forecaster
# ==============================================================================
cv = TimeSeriesFold(
        steps              = 12,
        initial_train_size = len(data_train),
        refit              = True,
        fixed_train_size   = False,
)
metrics, predictions = backtesting_stats(
                        forecaster        = forecaster,
                        y                 = data,
                        cv                = cv,
                        metric            = ['mean_absolute_error', 'mean_absolute_percentage_error'],
                        suppress_warnings = True,
                        verbose           = True,
                     )

Information of folds
--------------------
Number of observations used for initial training: 169
Number of observations used for backtesting: 84
    Number of folds: 7
    Number skipped folds: 0 
    Number of steps per fold: 12
    Number of steps to exclude between last observed data (last window) and predictions (gap): 0

Fold: 0
    Training:   1969-01-01 00:00:00 -- 1983-01-01 00:00:00  (n=169)
    Validation: 1983-02-01 00:00:00 -- 1984-01-01 00:00:00  (n=12)
Fold: 1
    Training:   1969-01-01 00:00:00 -- 1984-01-01 00:00:00  (n=181)
    Validation: 1984-02-01 00:00:00 -- 1985-01-01 00:00:00  (n=12)
Fold: 2
    Training:   1969-01-01 00:00:00 -- 1985-01-01 00:00:00  (n=193)
    Validation: 1985-02-01 00:00:00 -- 1986-01-01 00:00:00  (n=12)
Fold: 3
    Training:   1969-01-01 00:00:00 -- 1986-01-01 00:00:00  (n=205)
    Validation: 1986-02-01 00:00:00 -- 1987-01-01 00:00:00  (n=12)
Fold: 4
    Training:   1969-01-01 00:00:00 -- 1987-01-01 00:00:00  (n=217)
    Validation: 1987-02-01 00:00:00 -- 1988-01-01 00:00:00  (n=12)
Fold: 5
    Training:   1969-01-01 00:00:00 -- 1988-01-01 00:00:00  (n=229)
    Validation: 1988-02-01 00:00:00 -- 1989-01-01 00:00:00  (n=12)
Fold: 6
    Training:   1969-01-01 00:00:00 -- 1989-01-01 00:00:00  (n=241)
    Validation: 1989-02-01 00:00:00 -- 1990-01-01 00:00:00  (n=12)

  0%|          | 0/7 [00:00<?, ?it/s]

# Backtest predictions
# ==============================================================================
predictions.head(4)

	fold	pred
1983-02-01	0	414750.833100
1983-03-01	0	465097.100031
1983-04-01	0	504456.783383
1983-05-01	0	472048.144760

# Backtest metrics
# ==============================================================================
metrics

	mean_absolute_error	mean_absolute_percentage_error
0	19453.46286	0.035154

# Plot backtest predictions
# ==============================================================================
fig, ax = plt.subplots(figsize=(6, 3))
data.loc[end_train:].plot(ax=ax, label='test')
predictions['pred'].plot(ax=ax)
ax.set_title('Backtest predictions (folds of 12 months)')
ax.legend();

Model tuning (hiperparameters p, d, q)¶

The exploratory analysis has successfully narrowed down the search space for the optimal hyperparameters of the model. However, to definitively determine the most appropriate values, the use of strategic search methods is essential. Among these methods, two widely used approaches are:

Statistical Criteria: Information criterion metrics, such as Akaike's Information Criterion (AIC) or Bayesian Information Criterion (BIC), use different penalties on the maximum likelihood (log-likelihood) estimate of the model as a measure of fit. The advantage of using such criteria is that they are computed only on the training data, eliminating the need for predictions on new data. As a result, the optimization process is greatly accelerated. The well-known Auto Arima algorithm uses this approach.
Validation Techniques: The use of validation techniques, especially backtesting, is another effective strategy. Backtesting involves evaluating the performance of the model using historical data to simulate real-world conditions. This helps to validate the effectiveness of the hyperparameters under different scenarios, providing a practical assessment of their viability.

In the first approach, calculations are based solely on training data, eliminating the need for a separate data partition. This makes the optimization process extremely efficient. However, it is important to note that information criteria only measure the relative quality of models within the defined search space. A model with the lowest AIC could still be a poor fit in absolute terms. Therefore, the selected model should ideally undergo a backtesting phase. This phase calculates forecast error metrics (such as MAE, MSE, or MAPE) to validate performance on a meaningful, interpretable scale.

On the other hand, the second approach - validation techniques - tends to be more time-consuming, since the model must be trained and then evaluated on new data. However, the results generated are often more robust, and the metrics derived can provide deeper insights.

💡 Tip

In summary, while the statistical criteria approach offers speed and efficiency, validation techniques provide a more comprehensive and insightful evaluation, albeit at a slower pace due to their reliance on new data for testing. Fortunately, for sufficiently large data sets, they all lead to the same model.

⚠️ Warning

When evaluating ARIMA-SARIMAX models, it is important to note that AIC assumes that all models are trained on the same data. Thus, using AIC to decide between different orders of differencing is technically invalid, since one data point is lost with each order of differencing. Therefore, the Auto Arima algorithm uses a unit root test to select the order of differencing, and only uses the AIC to select the order of the AR and MA components. For a detailed explanation of Akaike's Information Criterion (AIC) see Rob J Hyndman's blog and AIC Myths and Misunderstandings by Anderson and Burnham.

AutoArima¶

AutoArima is an algorithm designed to automate the selection of optimal hyperparameters for an ARIMA model. The algorithm systematically evaluates various combinations of non-seasonal parameters ($p, d, q$), seasonal parameters ($P, D, Q$), and the seasonal period ($m$) to identify the configuration that best fits the data based on a specified criterion, typically the Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC).

With this estrategy, calculations are based solely on training data, eliminating the need for a separate data partition required by other strategies such as grid search. This makes the optimization process extremely efficient. However, it is important to note that information criteria only measure the relative quality of models within the defined search space. A model with the lowest AIC could still be a poor fit in absolute terms. Therefore, the selected model should ideally undergo a backtesting phase. This phase calculates forecast error metrics (such as MAE, MSE, or MAPE) to validate performance on a meaningful, interpretable scale.

Skforecast's Arima class triggers the AutoArima functionality whenever the order or seasonal_order arguments are set to None. After the model is fitted, the optimal parameters can be accessed via the best_params_ attribute. For all subsequent predictions, the model automatically utilizes the optimal configuration identified during the fitting process.

# Skforecast Auto Arima
# ==============================================================================
auto_arima = Arima(
    order          = None,  # Must be None to use AutoArima
    seasonal_order = None,  # Must be None to use AutoArima
    start_p        = 0,
    start_q        = 0,
    max_p          = 5,
    max_q          = 5,
    max_P          = 5,
    max_Q          = 2,
    max_order      = 5,
    max_d          = 2,
    max_D          = 1,
    ic             = "aic",
    m              = 12,
    stepwise       = True,  # True for faster results
    trace          = True,  # True for detailed information of the process
)
auto_arima.fit(y=data_train, suppress_warnings=True)
auto_arima

Fitting models using approximations...

 ARIMA(p,d,q)(P,D,Q)[m]                     : aic
 ------------------------------------------------
 ARIMA(0,1,0)(1,1,1)[12]                    : 3564.3887
 ARIMA(0,1,0)(0,1,0)[12]                    : 3637.1363
 ARIMA(1,1,0)(1,1,0)[12]                    : 3506.9117
 ARIMA(0,1,1)(0,1,1)[12]                    : 3481.3432
 ARIMA(0,1,1)(0,1,0)[12]                    : 3542.1014
 ARIMA(0,1,1)(1,1,1)[12]                    : 3482.0043
 ARIMA(0,1,1)(0,1,2)[12]                    : 3481.2735
 ARIMA(0,1,1)(1,1,2)[12]                    : 3482.5762
 ARIMA(0,1,0)(0,1,2)[12]                    : 3563.7959
 ARIMA(1,1,1)(0,1,2)[12]                    : 3476.4900
 ARIMA(1,1,1)(0,1,1)[12]                    : 3476.4887
 ARIMA(1,1,1)(0,1,0)[12]                    : 3531.8215
 ARIMA(1,1,1)(1,1,1)[12]                    : 3477.1082
 ARIMA(1,1,1)(1,1,0)[12]                    : 3493.3277
 ARIMA(1,1,1)(1,1,2)[12]                    : 3477.9947
 ARIMA(1,1,0)(0,1,1)[12]                    : 3491.3027
 ARIMA(2,1,1)(0,1,1)[12]                    : 3477.5383
 ARIMA(1,1,2)(0,1,1)[12]                    : 3477.8528
 ARIMA(0,1,0)(0,1,1)[12]                    : 3563.8825
 ARIMA(0,1,2)(0,1,1)[12]                    : 3478.2811
 ARIMA(2,1,0)(0,1,1)[12]                    : 3482.5359
 ARIMA(2,1,2)(0,1,1)[12]                    : 3479.4579

Now re-fitting the best model(s) without approximations...
 ARIMA(1,1,1)(0,1,1)[12]                    : 3476.4887

Best model found: ARIMA(1,1,1)(0,1,1)[12] with aic: 3476.488679996082

AutoArima(1,1,1)(0,1,1)[12]

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

# Optimal parameters selected by AutoArima
# ==============================================================================
print(f"Best parameters: {auto_arima.best_params_}")

Best parameters: {'order': (1, 1, 1), 'seasonal_order': (0, 1, 1), 'm': 12}

It may be of interest to capture the trace generated by the auto_arima function to allow for more comprehensive exploration of the results. The current implementation prints the results, but it is possible to capture and store them in a structured Pandas dataframe.

# Capture auto_arima trace in a pandas dataframe
# ==============================================================================
buffer = StringIO()
with contextlib.redirect_stdout(buffer):
    auto_arima = Arima(
        order          = None,  # Must be None to use AutoArima
        seasonal_order = None,  # Must be None to use AutoArima
        start_p        = 0,
        start_q        = 0,
        max_p          = 5,
        max_q          = 5,
        max_P          = 5,
        max_Q          = 2,
        max_order      = 5,
        max_d          = 2,
        max_D          = 1,
        ic             = "aic",
        m              = 12,
        stepwise       = True,  # True for faster results
        trace          = True,  # True for detailed information of the process
    )
    auto_arima.fit(y=data_train, suppress_warnings=True)

trace_autoarima = buffer.getvalue()
# Pattern matches: ARIMA(p,d,q)(P,D,Q)[m] : aic_value
pattern = r"ARIMA\((\d+),(\d+),(\d+)\)\((\d+),(\d+),(\d+)\)\[(\d+)\]\s+:\s+([\d\.]+)"
matches = re.findall(pattern, trace_autoarima)

results = pd.DataFrame(
    matches, columns=["p", "d", "q", "P", "D", "Q", "m", "AIC"]
)
results["order"] = results[["p", "d", "q"]].apply(
    lambda x: f"({x.iloc[0]},{x.iloc[1]},{x.iloc[2]})", axis=1
)
results["seasonal_order"] = results[["P", "D", "Q", "m"]].apply(
    lambda x: f"({x.iloc[0]},{x.iloc[1]},{x.iloc[2]},{x.iloc[3]})", axis=1
)
results = results[["order", "seasonal_order", "AIC"]]
results["AIC"] = results["AIC"].astype(float)
results = results.sort_values(by="AIC").drop_duplicates().reset_index(drop=True)
results.head()

	order	seasonal_order	AIC
0	(1,1,1)	(0,1,1,12)	3476.4887
1	(1,1,1)	(0,1,2,12)	3476.4900
2	(1,1,1)	(1,1,1,12)	3477.1082
3	(2,1,1)	(0,1,1,12)	3477.5383
4	(1,1,2)	(0,1,1,12)	3477.8528

Grid search with backtesting¶

Although less common in the context of statistical models, it is also possible to use grid search and random search to find optimal hyperparameters. It is crucial to conduct the search using a validation dataset, rather than the test dataset, to ensure an accurate and unbiased evaluation of model performance.

The training partition is split again to separate a validation partition, which is used for the hyperparameter search. The test partition remains untouched and is reserved exclusively for the final evaluation of the model's performance.

# Split Train-validation-test data
# ======================================================================================
end_train = '1979-01-01 23:59:59'
end_val   = '1983-01-01 23:59:59'

print(
    f"Train dates      : {data.index.min()} --- {data.loc[:end_train].index.max()}  "
    f"(n={len(data.loc[:end_train])})"
)
print(
    f"Validation dates : {data.loc[end_train:].index.min()} --- {data.loc[:end_val].index.max()}  "
    f"(n={len(data.loc[end_train:end_val])})"
)
print(
    f"Test dates       : {data.loc[end_val:].index.min()} --- {data.index.max()}  "
    f"(n={len(data.loc[end_val:])})"
)

# Plot
# ==============================================================================
fig, ax = plt.subplots(figsize=(7, 3))
data.loc[:end_train].plot(ax=ax, label='train')
data.loc[end_train:end_val].plot(ax=ax, label='validation')
data.loc[end_val:].plot(ax=ax, label='test')
ax.set_title('Monthly fuel consumption in Spain')
ax.legend();

Train dates      : 1969-01-01 00:00:00 --- 1979-01-01 00:00:00  (n=121)
Validation dates : 1979-02-01 00:00:00 --- 1983-01-01 00:00:00  (n=48)
Test dates       : 1983-02-01 00:00:00 --- 1990-01-01 00:00:00  (n=84)

# Grid search hyperparameter
# ======================================================================================
forecaster = ForecasterStats(
                 estimator=Arima(order=(1, 0, 0))  # Placeholder replaced in the grid search
             )

param_grid = {
    'order': [(0, 1, 0), (0, 1, 1), (1, 1, 0), (1, 1, 1), (2, 1, 1)],
    'seasonal_order': [(0, 0, 0,), (0, 1, 0), (1, 1, 1)],
    'm': [1, 12]
}

cv = TimeSeriesFold(steps=12, initial_train_size=len(data.loc[:end_train]))

results_grid = grid_search_stats(
                   forecaster        = forecaster,
                   y                 = data.loc[:end_val],
                   param_grid        = param_grid,
                   cv                = cv,
                   metric            = 'mean_absolute_error',
                   return_best       = True,
                   n_jobs            = 'auto',
                   suppress_warnings = True,
                   verbose           = False,
                   show_progress     = True
               )
results_grid.head(5)

Number of models compared: 30.

params grid:   0%|          | 0/30 [00:00<?, ?it/s]

`Forecaster` refitted using the best-found parameters, and the whole data set: 
  Parameters: {'m': 12, 'order': (0, 1, 0), 'seasonal_order': (1, 1, 1)}
  Backtesting metric: 17746.910354365504

	params	mean_absolute_error	m	order	seasonal_order
0	{'m': 12, 'order': (0, 1, 0), 'seasonal_order'...	17746.910354	12	(0, 1, 0)	(1, 1, 1)
1	{'m': 12, 'order': (1, 1, 0), 'seasonal_order'...	18096.558945	12	(1, 1, 0)	(1, 1, 1)
2	{'m': 12, 'order': (1, 1, 1), 'seasonal_order'...	18863.860377	12	(1, 1, 1)	(1, 1, 1)
3	{'m': 12, 'order': (0, 1, 1), 'seasonal_order'...	19640.700011	12	(0, 1, 1)	(1, 1, 1)
4	{'m': 12, 'order': (2, 1, 1), 'seasonal_order'...	21980.035173	12	(2, 1, 1)	(1, 1, 1)

The two candidate models, the one selected by the grid search based on backtesting with a mean absolute error, and the one selected by auto ARIMA based on AIC, are compared when forecasting the next three years in batches of 12 months.

# Backtest predictions with the best model according to grid search
# ==============================================================================
forecaster = ForecasterStats(
                 estimator=Arima(order=(0, 1, 0), seasonal_order=(1, 1, 1), m=12),
             )
cv = TimeSeriesFold(
        steps              = 12,
        initial_train_size = len(data.loc[:end_val]),
        refit              = True,
)
metric_m1, predictions_m1 = backtesting_stats(
                                forecaster        = forecaster,
                                y                 = data,
                                cv                = cv,
                                metric            = 'mean_absolute_error',
                                suppress_warnings = True,
                            )

# Backtest predictions with the best model according to auto arima
# ==============================================================================
forecaster = ForecasterStats(
                 estimator=Arima(order=(1, 1, 1), seasonal_order=(0, 1, 1), m=12),
             )
metric_m2, predictions_m2 = backtesting_stats(
                                forecaster        = forecaster,
                                y                 = data,
                                cv                = cv,
                                metric            = 'mean_absolute_error',
                                suppress_warnings = True,
                            )

  0%|          | 0/7 [00:00<?, ?it/s]

  0%|          | 0/7 [00:00<?, ?it/s]

# Compare predictions
# ==============================================================================
print("Metric (mean_absolute_error) for grid search model:")
display(metric_m1)
print("Metric (mean_absolute_error) for auto arima:")
display(metric_m2)

fig, ax = plt.subplots(figsize=(6, 3))
data.loc[end_val:].plot(ax=ax, label='test')
predictions_m1 = predictions_m1.rename(columns={'pred': 'grid search'})
predictions_m2 = predictions_m2.rename(columns={'pred': 'auto arima'})
predictions_m1['grid search'].plot(ax=ax)
predictions_m2['auto arima'].plot(ax=ax)
ax.set_title('Backtest predictions')
ax.legend();

Metric (mean_absolute_error) for grid search model:

	mean_absolute_error
0	24786.690683

Metric (mean_absolute_error) for auto arima:

	mean_absolute_error
0	19548.990992

For this data set, the configuration identified by the Auto ARIMA technique achieves better results.

Exogenous variables¶

Within the statsmodels library, the implementation of ARIMA-SARIMAX offers a valuable feature: the ability to integrate exogenous variables as forecasting factors alongside the primary time series under consideration. The only requirement for including an exogenous variable is the need to know the value of the variable also during the forecast period. The addition of exogenous variables is done using the exog argument.

💡 Tip

To learn more about exogenous variables and how to correctly manage them with skforecast visit: Exogenous variables (features) user guide.

Using an already trained ARIMA¶

⚠️ Warning

This section only applies when using skforecast class Sarimax as estimator within ForecasterStats. For all the other statiscal models, the model must be re-trained before making new predictions.

Forecasting with an ARIMA model becomes challenging when the forecast horizon data does not immediately follow the last observed value during the training phase. This complexity is due to the moving average (MA) component, which relies on past forecast errors as predictors. Thus, to predict at time $y$, the error of the $t-1$ prediction becomes a necessity. In situations where this prediction isn't available, the corresponding error remains unavailable. For this reason, in most cases, ARIMA models are retrained each time predictions need to be made.

Despite considerable efforts and advances to speed up the training process for these models, it is not always feasible to retrain the model between predictions, either due to time constraints or insufficient computational resources for repeated access to historical data. An intermediate approach is to feed the model with data from the last training observation to the start of the prediction phase. This technique enables the estimation of intermediate predictions and, as a result, the necessary errors. For example, imagine a situation where a model was trained 20 days ago with daily data from the past three years. When generating new predictions, only the 20 most recent values would be needed, rather than the complete historical dataset (365 * 3 + 20).

Integrating new data into the model can be complex, but the ForecasterStats class simplifies this considerably by automating the process through the last_window argument in its predict method.

# Split data Train - Last window - Test
# ==============================================================================
end_train = '1983-01-01 23:59:59'
end_last_window = '1988-01-01 23:59:59'

print(
    f"Train dates       : {data.index.min()} --- {data.loc[:end_train].index.max()}  "
    f"(n={len(data.loc[:end_train])})"
)
print(
    f"Last window dates : {data.loc[end_train:].index.min()} --- {data.loc[:end_last_window].index.max()}  "
    f"(n={len(data.loc[end_train:end_last_window])})"
)
print(
    f"Test dates        : {data.loc[end_last_window:].index.min()} --- {data.index.max()}  "
    f"(n={len(data.loc[end_last_window:])})"
)
data_train       = data.loc[:end_train]
data_last_window = data.loc[end_train:end_last_window]
data_test        = data.loc[end_last_window:]

# Plot
# ======================================================================================
fig, ax = plt.subplots(figsize=(7, 3))
data_train.plot(ax=ax, label='train')
data_last_window.plot(ax=ax, label='last window')
data_test.plot(ax=ax, label='test')
ax.legend();

Train dates       : 1969-01-01 00:00:00 --- 1983-01-01 00:00:00  (n=169)
Last window dates : 1983-02-01 00:00:00 --- 1988-01-01 00:00:00  (n=60)
Test dates        : 1988-02-01 00:00:00 --- 1990-01-01 00:00:00  (n=24)

The Forecaster is trained using data up until '1983-01-01 23:59:59' and will then utilize the remaining information as the last window of observations to generate new predictions. This means that the predictions will not start immediately after the last training observation, but rather after the last observation in the provided last_window.

# Create and fit ForecasterStats 
# ==============================================================================
forecaster = ForecasterStats(
                 estimator=Sarimax(order=(1, 1, 1), seasonal_order=(0, 1, 1, 12)),
             )
forecaster.fit(y= data_train, suppress_warnings=True)

# Predict with last window
# ==============================================================================
predictions = forecaster.predict(
                  steps       = len(data_test),
                  last_window = data_last_window,
              )
predictions.head(3)

1988-02-01    492215.254120
1988-03-01    570603.230916
1988-04-01    605183.040580
Freq: MS, Name: pred, dtype: float64

# Plot predictions
# ==============================================================================
fig, ax = plt.subplots(figsize=(7, 3))
data_train.plot(ax=ax, label='train')
data_last_window.plot(ax=ax, label='last window')
data_test.plot(ax=ax, label='test')
predictions.plot(ax=ax, label='predictions')
ax.legend();

Memory optimization¶

For production environments where you need to store many fitted models but only require forecasting capabilities (not diagnostics), you can significantly reduce memory usage with the reduce_memory() method. This is especially useful when working with large datasets or deploying models in resource-constrained environments. This method removes in-sample fitted values and residuals, which are only needed for diagnostic purposes but not for generating forecasts.

# Compare size before and after reduce_memory()
# ==============================================================================
import sys

def total_model_size(model):
    size = sys.getsizeof(model)
    for attr_name in dir(model):
        if attr_name.startswith('_'):
            continue
        try:
            attr = getattr(model, attr_name)
            size += sys.getsizeof(attr)
        except Exception:
            pass
    return size

model=Arima(order=(1, 1, 1), seasonal_order=(0, 1, 1), m=12)
model.fit(y=data_train)
model_size_before = total_model_size(model)
print(f"Memory before reduce_memory(): {model_size_before / 1024:.3f} KB")

# Reduce memory
model.reduce_memory()
model_size_after = total_model_size(model)
print(f"Memory after reduce_memory(): {model_size_after / 1024:.3f} KB")
print(f"Memory reduction: {(1 - model_size_after / model_size_before) * 100:.1f}%")

Memory before reduce_memory(): 6.812 KB
Memory after reduce_memory(): 3.843 KB
Memory reduction: 43.6%

Session information¶

import session_info
session_info.show(html=False)

-----
matplotlib          3.10.8
numpy               2.2.6
pandas              2.3.3
session_info        v1.0.1
skforecast          0.20.0
statsmodels         0.14.6
-----
IPython             9.8.0
jupyter_client      7.4.9
jupyter_core        5.9.1
notebook            6.5.7
-----
Python 3.13.11 | packaged by Anaconda, Inc. | (main, Dec 10 2025, 21:28:48) [GCC 14.3.0]
Linux-6.14.0-37-generic-x86_64-with-glibc2.39
-----
Session information updated at 2026-02-01 20:02

Bibliography¶

Hyndman, R.J., & Athanasopoulos, G. (2021). Forecasting: principles and practice, 3rd edition. OTexts: Melbourne, Australia. https://otexts.com/fpp3/

Time Series Analysis and Forecasting with ADAM Ivan Svetunkov

Python for Finance: Mastering Data-Driven Finance

Forecasting: theory and practice

Citation¶

How to cite this document

If you use this document or any part of it, please acknowledge the source, thank you!

ARIMA and SARIMAX models with python by Joaquín Amat Rodrigo and Javier Escobar Ortiz, available under Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0 DEED) at https://www.cienciadedatos.net/documentos/py51-arima-sarimax-models-python.html

How to cite skforecast

If you use skforecast for a publication, we would appreciate it if you cite the published software.

Zenodo:

Amat Rodrigo, Joaquin, & Escobar Ortiz, Javier. (2024). skforecast (v0.20.0). Zenodo. https://doi.org/10.5281/zenodo.8382787

APA:

Amat Rodrigo, J., & Escobar Ortiz, J. (2024). skforecast (Version 0.20.0) [Computer software]. https://doi.org/10.5281/zenodo.8382787

BibTeX:

@software{skforecast, author = {Amat Rodrigo, Joaquin and Escobar Ortiz, Javier}, title = {skforecast}, version = {0.20.0}, month = {02}, year = {2026}, license = {BSD-3-Clause}, url = {https://skforecast.org/}, doi = {10.5281/zenodo.8382788} }

Did you like the article? Your support is important

Your contribution will help me to continue generating free educational content. Many thanks! 😊

This work by Joaquín Amat Rodrigo and Javier Escobar Ortiz is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International.

Allowed:

Share: copy and redistribute the material in any medium or format.
Adapt: remix, transform, and build upon the material.

Under the following terms:

Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
NonCommercial: You may not use the material for commercial purposes.
ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

ARIMA and SARIMAX models with Python

Joaquín Amat Rodrigo, Javier Escobar Ortiz

September, 2023 (last update January 2026)

Introduction¶

Libraries¶

Data¶

Exploratory Analysis¶

Stationarity¶

Autocorrelation Analysis¶

Time series decomposition¶

Conclusions¶

ARIMA-SARIMAX model¶

Statsmodels¶

Skforecast Sarimax¶

Skforecast Arima¶

Prediction's comparison¶

ForecasterStats¶

Train¶

Prediction¶

Prediction intervals¶

Feature importances¶

Prediction on training data (In-sample Predictions)¶

Backtesting¶

Model tuning (hiperparameters p, d, q)¶

AutoArima¶

Grid search with backtesting¶

Exogenous variables¶

Using an already trained ARIMA¶

Memory optimization¶

Session information¶

Bibliography¶

Citation¶

	order	None
	seasonal_order	None
	m	12
	include_mean	True
	transform_pars	True
	method	'CSS-ML'
	n_cond	None
	SSinit	'Gardner1980'
	optim_method	'BFGS'
	optim_kwargs	{'maxiter': 1000}
	kappa	1000000.0