Memory Error when making time-steps for LSTMS (python)

Question

This is the function I use to form my lstm inputs

It takes two inputs rucio_data(a 2-d numpy array) and durations(a 1-d numpy array). The size of the rucio_data array is around (2000000, 9).

def prepare_model_inputs(rucio_data,durations, num_timesteps=50):                                                                                                                                                                                                     

print(rucio_data.shape[0], durations.shape)                                                                                    
n_examples = rucio_data.shape[0]                                                                                               
n_batches = (n_examples - num_timesteps +1)                                                                                    
print('Total Data points for training/testing : {} of {} timesteps each.'.format(n_batches, num_timesteps))                    

inputs=[]                                                                                                                      
outputs=[]                                                                                                                     
for i in range(0,n_batches):                                                                                                   
    v = rucio_data[i:i+num_timesteps]                                                                                          
    w = durations[i+num_timesteps-1]                                                                                           
    inputs.append(v)                                                                                                           
    outputs.append(w)                                                                                                          
print(len(inputs))                                                                                                             
inputs = np.stack(inputs)                                                                                                      
outputs = np.stack(outputs)                                                                                                    
print(inputs.shape, outputs.shape)                                                                                             

return inputs, outputs

the problem is that my system runs out of memory at the inputs=np.stack(inputs) step.

I need a more memory efficient way of doing this.

Mohamed · Accepted Answer

Instead of preparing all your input to one variable, why won't you try to generate batch of reduced size for each call.

In [1]: def prepare_model_inputs(rucio_data,durations, batch_size=150, num_timesteps=50):
   ...:     n = rucio_data.shape[0]
   ...:     while True:
   ...:         inputs, outputs =[], []
   ...:         for i in range(0, n, batch_size):
   ...:             inputs = rucio_data[i:i+batch_size] #batch_size is 3xnum_timesteps
   ...:             outputs = durations[i+num_timesteps-1]
   ...:             # split your inputs, and outputs as you wish
   ...:                             
   ...:             yield inputs, outputs

Now on your train scripts, you can call this generator to feed data to your model. In keras this would give something like:

>> generator = prepare_model_inputs(rucio_data,durations)
>> model.fit_generator(generator,...)

Memory Error when making time-steps for LSTMS (python)

Answers (1)

Related Questions