Reputation: 1
I have around 500 time series dataset for a period of 2.5 yrs with a granularity of 1 day for each series. This amounts to roughly 1 million data points. I want to forecast for 2 weeks in 1 day granularity for each of the time series. There might be correlation among these 500 time series. After ensuring that I have data for each timestamp, we are feeding these (500) time series to autoML where each time series is identified by “series identifier”. So, our input to the autoML (Forecasting) is timestamp, series identifier, features, and target value. I have 30 feature which are combination of categorical and numerical. With this setup, if I feed to autoML, it is taking more than 20 hrs for training which is not cost effective for me.
Please help me to optimized this.
Upvotes: 0
Views: 331
Reputation: 3945
AutoML is a black box. There is litle you can do to optimize training time because AutoML will do feature engineering under the hood, and will try very hard not to overfit your data.
You have just two options here:
Train a model with a smaller dataset with the most important time series (it will take time because automl will have to fight not to overfit your dataset).
Remove the time series identifier if it makes sense to you. This gives autoML more chances not to overfit data and might get a result earlier.
Please remeber you're tweaking a black box. Your mileage will vary.
Upvotes: 1