Reputation: 73
I have dataset containing water samples collected from different locations. For example, ABC1 water sample is taken from a river in Arizona and ABC2 is a water sample taken from a river in Boston. They are both rivers, they have the same feature columns(pH, temp, etc...) but they are in different locations so the changes in features are individual to them. So my goal is to create one river model because I do not have enough data to create individual models. There are total 11 columns that I want to predict next months values. My dataset looks like this:
Date Sample_Name pH temp etc...
2009-01-01 ABC1 7.2 12
2009-01-02 ABC2 5.5 11
.
.
2009-01-02 ABC1 7.2 10
2009-01-02 ABC2 7.3 10
.
.
2013-06-02 ABC2 6.5 22
2013-06-04 ABC1 6.5 22
.
2015-01-05 ABC1 8.9 13
2015-01-05 ABC4 8.8 13
I want to feed every sample and its sequence to an LSTM model. For example; every measurement(row) of ABC1 must be given to a model as a sequence, or a batch. Is it possible to do this kind of data preparation using TimeseriesGenerator? How can I prepare my data in a way to feed it to the model as I described? Also does it help to sort the dataset with date and sample name(alphabetically)? I am trying to achieve something like this
I want to generate data using:
from keras.preprocessing.sequence import TimeseriesGenerator
n_timesteps = 2
n_features = 10
batch_size = 5
generator = TimeseriesGenerator(df, df, length, sampling_rate = 10, stride = 1, batch_size = batch_size)
The simple LSTM model that I want to feed my data in:
from keras.models import Sequential
from keras.layers import LSTM, Dense
from keras.utils import Sequence
model = Sequential()
model.add(LSTM(n_features, activation='relu', input_shape=(n_timesteps, n_features)))
model.add(Dense(10))
model.compile(optimizer='adam', loss='mse', metrics = ['accuracy'])
Upvotes: 1
Views: 319
Reputation: 8219
Looking at the docs,tf.keras.preprocessing.sequence.TimeseriesGenerator cannot take a dictionary as the first argument. The 'slice' error is just a manifestation of that fact, as the function tries to use slices of the first argument (dict) and fails. again from the docs:
Arguments: data: Indexable generator (such as list or Numpy array) containing consecutive data points (timesteps).
so perhaps you want to pass input_dict['ABC1']
or possibly input_dict['ABC1'].values
Upvotes: 2