Create hundreds of TimeSeries & Train-/Testsets with loop or function

Question

MWE

I have a dataset with a bit more than 1 Mio rows, containing several 100 TimeSeries. Here a simplified MWE of this data:

import pandas as pd

df = pd.DataFrame({"dtime":["2022-01-01", "2022-01-02", "2022-01-03", "2022-01-01", "2022-01-02", "2022-01-03",
                           "2022-01-01", "2022-01-02", "2022-01-03","2022-01-01", "2022-01-02", "2022-01-03"],
                   "Type":["A","A","A","B","B","B","C","C","C","D","D","D"],
                   "Value":[1,2,3,4,6,8,1,5,8,3,1,2]})

+----+------------+--------+---------+
|    | dtime      | Type   |   Value |
|----+------------+--------+---------|
|  0 | 2022-01-01 | A      |       1 |
|  1 | 2022-01-02 | A      |       2 |
|  2 | 2022-01-03 | A      |       3 |
|  3 | 2022-01-01 | B      |       4 |
|  4 | 2022-01-02 | B      |       6 |
|  5 | 2022-01-03 | B      |       8 |
|  6 | 2022-01-01 | C      |       1 |
|  7 | 2022-01-02 | C      |       5 |
|  8 | 2022-01-03 | C      |       8 |
|  9 | 2022-01-01 | D      |       3 |
| 10 | 2022-01-02 | D      |       1 |
| 11 | 2022-01-03 | D      |       2 |
+----+------------+--------+---------+

Type represents the TimeSeries-Group, so A is one TimeSerie, B another and so on.

Goal

I want to train a multi dimensional TimeSeries-NN (like provided by the unit8-dartspackage).

from darts.models import NBEATSModel
model = NBEATSModel(input_chunk_length=50, output_chunk_length=50, n_epochs=25)
model.fit([train1, train2, train3, train4])

For this I need the Type separated and converted into TimeSeries format and finally split into train/test. Like this:

from darts import TimeSeries

split_date = "2022-01-02"

series1 = TimeSeries.from_dataframe(df[df["Type"] == "A"], "dtime", "Value", freq="D", fillna_value=0)
series2 = TimeSeries.from_dataframe(df[df["Type"] == "B"], "dtime", "Value", freq="D", fillna_value=0)
series3 = TimeSeries.from_dataframe(df[df["Type"] == "C"], "dtime", "Value", freq="D", fillna_value=0)
series4 = TimeSeries.from_dataframe(df[df["Type"] == "D"], "dtime", "Value", freq="D", fillna_value=0)
train1, val1 = series1.split_before(pd.Timestamp(split_date))
train2, val2 = series2.split_before(pd.Timestamp(split_date))
train3, val3 = series3.split_before(pd.Timestamp(split_date))
train4, val4 = series4.split_before(pd.Timestamp(split_date))

But as the real world data has way more than 4 Type to do this procedure manually would be an overkill and so I'm looking for a solution with a loop or a function to create this train, test and series TS-objects.

And additionally to this the sequential series, train and test TS-objects I want to create a list containing each trainX name like:

ts_list = [train1, train2, train3, train4]

Does somebody has an idea how I can do this? I'm happy for any proposal.

L&#233;o Beaucourt · Accepted Answer

Did you tried to use a groupby over Type column in a loop :

train_list = []
for ts_type, group in df.groupby('Type'):
    series = TimeSeries.from_dataframe(group, "dtime", "Value", freq="D", fillna_value=0)
    train, val = series.split_before(pd.Timestamp(split_date))
    train_list.append(train)

But, with a lot of data it could becomes computively expensive to use loop with Pandas. So maybe a better solution can be found (using other tools like spark for exemple).

Create hundreds of TimeSeries & Train-/Testsets with loop or function

MWE

Goal

Answers (2)

Related Questions

Create hundreds of TimeSeries &amp; Train-/Testsets with loop or function

MWE

Goal

Answers (2)

Related Questions

Create hundreds of TimeSeries & Train-/Testsets with loop or function