Reputation: 8025
Given data from week 1 and week 2, I am trying to train a model to predict on week 3 data.
the target label is called target.
I am confused about what the correct features should be used to train the model given this problem looks at a user historical action to predict their future action
train data
id,date,week_day,target
1,2019-01-01,1,10
1,2019-01-02,2,6
1,2019-01-03,3,7
2,2019-01-01,1,8
2,2019-01-02,1,5
2,2019-01-03,1,4
test data (See future date)
id,date,week_day,target
1,2019-01-10,1,15
1,2019-01-11,2,13
1,2019-01-12,3,8
2,2019-01-10,1,7
2,2019-01-11,1,7
2,2019-01-12,1,4
1)Im wondering whether it is correct to keep id as a feature in the training data? i know most ML problems do not keep the id field, but this problem is a little different that the same id field is being used in the test dataset.
2) i plan to drop the date field
Upvotes: 0
Views: 288
Reputation: 1009
1)Im wondering whether it is correct to keep id as a feature in the training data? i know most ML problems do not keep the id field, but this problem is a little different that the same id field is being used in the test dataset.
As I see you have two types of dates for the same id
(in both train and test sets). So, if this id
represents something related to the target - keep it. Otherwise, drop it.
2) i plan to drop the date field
And you will lose year, months, week number, day number, holiday day mark as possible features.
In addition to SARIMA I can advise to try to fit some regression model here. Sometimes they work in time-series-like tasks.
Upvotes: 1
Reputation: 2838
Your data has way too less features, You can try multiple models like Sarima as suggested by Pierre, but with only those features you might struggle, I would suggest you to try and plot a correlation matrix and see if there is any co-relation between Inputs and Outputs, if there isn't no model can help you, if there is a co-relation between features, then only a model will be able to learn that co-relation and generalize.
This link can be helpful if you don't know how to plot a co-relation matrix https://seaborn.pydata.org/examples/many_pairwise_correlations.html
This link can help you make sense of co-relation matrix if you are not familiar with them https://machinelearningmastery.com/how-to-use-correlation-to-understand-the-relationship-between-variables/
If you are unable to understand something from the links, feel free to comment.
Upvotes: 0