Simone
Simone

Reputation: 4940

How to shape a Pandas DataFrame for LSTM

I am trying to correctly shape a pandas DataFrame into a format compatible with Keras's method timeseries_dataset_from_array(). The main issue is about the high number of features in the dataset, which implies to use a for loop:

x_reshaped = [tf.reshape(x_train.iloc[:-seq_length, column].values, (-1, 1))
              for column in range(len(x_train.columns))] 

When I run:

tf.keras.preprocessing.timeseries_dataset_from_array(x_reshaped, y_reshaped,
                                                          sequence_length=seq_length,
                                                          batch_size=bs)

I get this error message AttributeError: 'list' object has no attribute 'shape', because x_reshaped is considered as a list of vectors, instead of a matrix.

I have tried, also, to create the training matrix by tf.TensorArray(), but it doesn't work.

Upvotes: 2

Views: 822

Answers (1)

Nicolas Gervais
Nicolas Gervais

Reputation: 36674

I wouldn't call that high-dimensional, because a Pandas DataFrame normally has 1D data. You don't need the reshaping operation or the for loop. You can pass the DataFrame directly into the Keras function:

import pandas as pd
import tensorflow as tf

df = pd.DataFrame({'features1':    [0., 1., 2., 3., 4.],
                   'features2':    [4., 3., 2., 1., 0.],
                   'predictions':  [5., 6., 7., 8., 9.]})
   features1  features2  predictions
0        0.0        4.0          5.0
1        1.0        3.0          6.0
2        2.0        2.0          7.0
3        3.0        1.0          8.0
4        4.0        0.0          9.0

Pass the column in the Keras function:

ds = tf.keras.preprocessing.timeseries_dataset_from_array(
     data=df[['features1', 'features2']],
     targets=df['predictions'],
     sequence_length=3)

x, y = next(iter(ds))

print(x.shape)
TensorShape([3, 3, 2])

It will look like this:

<tf.Tensor: shape=(3, 3, 2), dtype=float64, numpy=
array([[[0., 4.],
        [1., 3.],
        [2., 2.]],
       [[1., 3.],
        [2., 2.],
        [3., 1.]],
       [[2., 2.],
        [3., 1.],
        [4., 0.]]])>

Now that it has a time step dimension, it can be passed into an LSTM layer.

lstm_layer = tf.keras.layers.LSTM(1)

lstm_layer(x)
<tf.Tensor: shape=(3, 1), dtype=float32, numpy=
array([[0.11129456],
       [0.14443444],
       [0.18482907]], dtype=float32)>

Upvotes: 2

Related Questions