jxn
jxn

Reputation: 8025

best way to train a regression model given time series data

Given data from week 1 and week 2, I am trying to train a model to predict on week 3 data.

the target label is called target.

I am confused about what the correct features should be used to train the model given this problem looks at a user historical action to predict their future action

train data

id,date,week_day,target
1,2019-01-01,1,10
1,2019-01-02,2,6
1,2019-01-03,3,7
2,2019-01-01,1,8
2,2019-01-02,1,5
2,2019-01-03,1,4

test data (See future date)

id,date,week_day,target
1,2019-01-10,1,15
1,2019-01-11,2,13
1,2019-01-12,3,8
2,2019-01-10,1,7
2,2019-01-11,1,7
2,2019-01-12,1,4

1)Im wondering whether it is correct to keep id as a feature in the training data? i know most ML problems do not keep the id field, but this problem is a little different that the same id field is being used in the test dataset.

2) i plan to drop the date field

Upvotes: 0

Views: 288

Answers (3)

andrewchauzov
andrewchauzov

Reputation: 1009

1)Im wondering whether it is correct to keep id as a feature in the training data? i know most ML problems do not keep the id field, but this problem is a little different that the same id field is being used in the test dataset.

As I see you have two types of dates for the same id (in both train and test sets). So, if this id represents something related to the target - keep it. Otherwise, drop it.

2) i plan to drop the date field

And you will lose year, months, week number, day number, holiday day mark as possible features.

In addition to SARIMA I can advise to try to fit some regression model here. Sometimes they work in time-series-like tasks.

Upvotes: 1

anand_v.singh
anand_v.singh

Reputation: 2838

Your data has way too less features, You can try multiple models like Sarima as suggested by Pierre, but with only those features you might struggle, I would suggest you to try and plot a correlation matrix and see if there is any co-relation between Inputs and Outputs, if there isn't no model can help you, if there is a co-relation between features, then only a model will be able to learn that co-relation and generalize.

This link can be helpful if you don't know how to plot a co-relation matrix https://seaborn.pydata.org/examples/many_pairwise_correlations.html

This link can help you make sense of co-relation matrix if you are not familiar with them https://machinelearningmastery.com/how-to-use-correlation-to-understand-the-relationship-between-variables/

If you are unable to understand something from the links, feel free to comment.

Upvotes: 0

Pierre S.
Pierre S.

Reputation: 1129

It looks like your problem can be seen as time series forecast. You have seasonality in your data. Instead of performing regression, you can try algorithm such as sarima

Upvotes: 1

Related Questions