Reputation: 4940
I am trying to correctly shape a pandas
DataFrame into a format compatible with Keras
's method timeseries_dataset_from_array()
. The main issue is about the high number of features in the dataset, which implies to use a for loop:
x_reshaped = [tf.reshape(x_train.iloc[:-seq_length, column].values, (-1, 1))
for column in range(len(x_train.columns))]
When I run:
tf.keras.preprocessing.timeseries_dataset_from_array(x_reshaped, y_reshaped,
sequence_length=seq_length,
batch_size=bs)
I get this error message AttributeError: 'list' object has no attribute 'shape'
, because x_reshaped
is considered as a list of vectors, instead of a matrix.
I have tried, also, to create the training matrix by tf.TensorArray()
, but it doesn't work.
Upvotes: 2
Views: 822
Reputation: 36674
I wouldn't call that high-dimensional, because a Pandas DataFrame normally has 1D data. You don't need the reshaping operation or the for loop. You can pass the DataFrame directly into the Keras function:
import pandas as pd
import tensorflow as tf
df = pd.DataFrame({'features1': [0., 1., 2., 3., 4.],
'features2': [4., 3., 2., 1., 0.],
'predictions': [5., 6., 7., 8., 9.]})
features1 features2 predictions
0 0.0 4.0 5.0
1 1.0 3.0 6.0
2 2.0 2.0 7.0
3 3.0 1.0 8.0
4 4.0 0.0 9.0
Pass the column in the Keras function:
ds = tf.keras.preprocessing.timeseries_dataset_from_array(
data=df[['features1', 'features2']],
targets=df['predictions'],
sequence_length=3)
x, y = next(iter(ds))
print(x.shape)
TensorShape([3, 3, 2])
It will look like this:
<tf.Tensor: shape=(3, 3, 2), dtype=float64, numpy=
array([[[0., 4.],
[1., 3.],
[2., 2.]],
[[1., 3.],
[2., 2.],
[3., 1.]],
[[2., 2.],
[3., 1.],
[4., 0.]]])>
Now that it has a time step dimension, it can be passed into an LSTM layer.
lstm_layer = tf.keras.layers.LSTM(1)
lstm_layer(x)
<tf.Tensor: shape=(3, 1), dtype=float32, numpy=
array([[0.11129456],
[0.14443444],
[0.18482907]], dtype=float32)>
Upvotes: 2