AndreasInfo
AndreasInfo

Reputation: 1227

How to capture trend in time-series data for forecasting using scikit-learn's LinearRegression()

I have read some literature about time series forecasting with ML. I get the concepts of

  1. trend
  2. seasonality
  3. cyclic
  4. noise

I would like to use scikit-learn's LinearRegression() as a start to make predictions. If I get it right, I can capture seasonality and cyclic with some feature engineering like day_of_week, month or seasons. I don't get it though, how to capture trend in the data. Is it lag features or a column calculating differences instead of totals?

Upvotes: 0

Views: 2470

Answers (2)

Prayson W. Daniel
Prayson W. Daniel

Reputation: 15578

Check out sktime + sklearn to perform forecasting: You would be able to perform most of time-series analysis with them. Example,from my gist, show how you can assemble models two models to predict trends

from pytrends.request import TrendReq
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.neighbors import KNeighborsRegressor
from sktime.forecasting.base import ForecastingHorizon
from sktime.forecasting.compose import EnsembleForecaster, ReducedForecaster
from sktime.forecasting.model_selection import temporal_train_test_split
from sktime.performance_metrics.forecasting import smape_loss
from sktime.utils.plotting import plot_series


# fetch cyberbullying data from Google trends
pytrend = TrendReq(hl="en-US")
pytrend.build_payload(
    kw_list=[
        "cyberbullying",
    ]
)
cyberbullying_df = pytrend.interest_over_time()

# transfrom DataFrame to Uni-Series of period
fow = cyberbullying_df["cyberbullying"].to_period(freq="W")

y_train, y_test = temporal_train_test_split(fow, test_size=36)
fh = ForecastingHorizon(y_test.index, is_relative=False)

# forecaster ensemble of knn and gradient boosting regressor
forecaster = EnsembleForecaster(
    [
        (
            "knn",
            ReducedForecaster(
                regressor=KNeighborsRegressor(n_neighbors=1),
                window_length=52,
                strategy="recursive",
                scitype="regressor",
            ),
        ),
        (
            "gboost",
            ReducedForecaster(
                regressor=GradientBoostingRegressor(n_estimators=100, random_state=42),
                window_length=52,
                strategy="recursive",
                scitype="regressor",
            ),
        ),
    ]
)

# train an ensemble forecasters and predict|forecast
forecaster.fit(y_train)
y_pred = forecaster.predict(fh)

sktimes allows you to also use Facebook’s prophet. Give it a go as it’s my tool for doing time-series analysis: sktime

Upvotes: 1

Eddy Piedad
Eddy Piedad

Reputation: 366

Linear regression fits the data into a linear model basically a function Y = W*X with coefficients w = (w1, …, wp) with minimized residual sum of squares between the true values and its corresponding predicted values.

Obviously, time-series data, by nature, is not linear. In order to capture seasonality and cyclic patterns, I would suggest you to use polynomial function, at least with the power of n > 2. You can use more advance regression models such as support vector and random forest models.

But for sure, you can start from linear model. Then later, you can easily shift to other advance models after realizing the limitations of linear models.

Upvotes: 1

Related Questions