Reputation: 75
This is the function I use to form my lstm inputs
It takes two inputs rucio_data(a 2-d numpy array) and durations(a 1-d numpy array). The size of the rucio_data array is around (2000000, 9).
def prepare_model_inputs(rucio_data,durations, num_timesteps=50):
print(rucio_data.shape[0], durations.shape)
n_examples = rucio_data.shape[0]
n_batches = (n_examples - num_timesteps +1)
print('Total Data points for training/testing : {} of {} timesteps each.'.format(n_batches, num_timesteps))
inputs=[]
outputs=[]
for i in range(0,n_batches):
v = rucio_data[i:i+num_timesteps]
w = durations[i+num_timesteps-1]
inputs.append(v)
outputs.append(w)
print(len(inputs))
inputs = np.stack(inputs)
outputs = np.stack(outputs)
print(inputs.shape, outputs.shape)
return inputs, outputs
the problem is that my system runs out of memory at the inputs=np.stack(inputs)
step.
I need a more memory efficient way of doing this.
Upvotes: 1
Views: 219
Reputation: 486
Instead of preparing all your input to one variable, why won't you try to generate batch of reduced size for each call.
In [1]: def prepare_model_inputs(rucio_data,durations, batch_size=150, num_timesteps=50):
...: n = rucio_data.shape[0]
...: while True:
...: inputs, outputs =[], []
...: for i in range(0, n, batch_size):
...: inputs = rucio_data[i:i+batch_size] #batch_size is 3xnum_timesteps
...: outputs = durations[i+num_timesteps-1]
...: # split your inputs, and outputs as you wish
...:
...: yield inputs, outputs
Now on your train scripts, you can call this generator to feed data to your model. In keras this would give something like:
>> generator = prepare_model_inputs(rucio_data,durations)
>> model.fit_generator(generator,...)
Upvotes: 2