Reputation: 2478
I was reading the tutorial on Multivariate Time Series Forecasting with LSTMs in Keras https://machinelearningmastery.com/multivariate-time-series-forecasting-lstms-keras/#comment-442845
I have followed through the entire tutorial and got stuck with a problem which is as follows-
In this tutorial, the train and test splits have 8 features viz., 'pollution', 'dew', 'temp', 'press', 'wnd_dir', 'wnd_spd', 'snow', 'rain' at step 't-1', while the output feature is 'pollution' at current step 't'. This is because, the framing of the dataset as a supervised learning problem is about predicting the 'pollution' at current hour/time step 't', given the pollution and weather measurements at the prior hour/time step 't-1'
After fitting the model to the training and testing data splits, what if I want to make predictions for a new dataset having 7 features since it does not have 'pollution' feature in it and I explicitly just want to predict for this one feature using the other 7 features.
Thanks for your help!
How do I handle such a situation? (while the remaining 7 features remain the same)
Edit- Assume that my dataset has the following 3 features while training/fitting the model- shop_number, item_number, number_of_units_sold
AFTER, I have trained the LSTM model, I get a dataset having the features- 'shop_number' AND 'item_number'. The dataset DOES NOT have 'number_of_units_sold'.
The 'input_shape' argument in 'LSTM' has 1 as time step and 3 as features while training. But while predicting, I have 1 time step but ONLY 2 features (as 'number_of_units_sold' is what I have to predict).
So how should I proceed?
Upvotes: 0
Views: 1325
Reputation: 86620
If pollution is the last feature:
X = original_data[:,:,:-1]
Y = original_data[:,:,-1:]
If pollution is the first feature
X = original_data[:,:,1:]
Y = original_data[:,:,:1]
Else
i = index_of_pollution_feature
X = np.concatenate([original_data[:,:,:i], original_data[:,:,i+1:],axis=-1)
Y = original_data[:,:,i:i+1]
Make a model with return_sequences=True
, stative=False
and that's it.
Don't use Flatten, Global poolings or anything that removes the steps dimension.
If you don't have any pollution data at all for training, then you can't.
Upvotes: 0