Reputation: 4545
For simplicity, say that I am attempting to predict the following day of a sequence of single-valued variables, therefore my datasaet would be in the form of:
input label
x1 x2
x2 x3
x3 x4
... ...
xt xt+1
However, my data has the same sequences in time for many different users, therefore it is in the following form:
input label
u1x1 u1x2
u1x2 u1x3
u1x3 u1x4
... ...
u1xt u1xt+1
u2x1 u2x2
u2x2 u2x3
u2x3 u2x4
... ...
u2xt u2xt+1
... ...
unx1 unx2
unx2 unx3
unx3 unx4
... ...
unxt unxt+1
What is an acceptable way to structure this data and feed it into DAI such that it is not treated as one entire long sequence, but rather a bunch of not directly related sequences parallel in time?
Edit: The data has a 'UserID' column. Can DAI automatically use this to overcome the problem I am explaining?
Upvotes: 1
Views: 165
Reputation: 5778
To format your data for forecasting, you need to aggregate your data for each group of interest and for a specific time period (in your case one day).
So if your forecast horizon is one day, you need to aggregate by user, your single-valued variable, and by day so that you have a target (label) as a total amount per day. You can find documentation on how to setup your data for driverless here and here.
EDIT in response to comment:
Here is another example to explain the expected data format using the assumption that each user should be aggregated at the day level:
If you have one day’s worth of data for 5 users your dataset should only have 5 rows, but if you have 10 days worth of data for 5 users you should have 50 rows of data.
Then in Driverless AI when you set up your experiment you would set your Time Group to the User column
Upvotes: 1