laiki
laiki

Reputation: 25

LSTM: Input 0 of layer sequential is incompatible with the layer

I know there are several questions about this here, but I haven't found one which fits exactly my problem. I'm trying to fit an LSTM with data from Pandas DataFrames but getting confused about the format I have to provide them. I created a small code snipped which shall show you what I try to do:

import pandas as pd, tensorflow as tf, random
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

targets = pd.DataFrame(index=pd.date_range(start='2019-01-01', periods=300, freq='D'))
targets['A'] = [random.random() for _ in range(len(targets))]
targets['B'] = [random.random() for _ in range(len(targets))] 
features = pd.DataFrame(index=targets.index)
for i in range(len(features)) :
    features[str(i)] = [random.random() for _ in range(len(features))] 

model = Sequential()
model.add(LSTM(units=targets.shape[1], input_shape=features.shape))
model.compile(optimizer='adam', loss='mae')

model.fit(features, targets, batch_size=10, epochs=10)

this results to:

ValueError: Input 0 of layer sequential is incompatible with the layer: expected ndim=3, found ndim=2. Full shape received: [10, 300]

which I expect relates to the dimensions of the features DataFrame provided. I guess that once fixed this the next error would mention the targets DataFrame.

As far as I understand, 'units' parameter of my first layer defines the output dimensionality of this model. The inputs have to have a 3D shape, but I don't know how to create them out of the 2D world of the Data Frames. I hope you can help me understanding the reshape mechanism in Python and how to use them in combination with Pandas DataFrames. (I'm quite new to Python and came from R)

Thankls in advance

Upvotes: 1

Views: 2574

Answers (3)

laiki
laiki

Reputation: 25

Well, just to finalize this issue I would like to provide one solution I have meanwhile worked on. The class TimeseriesGenerator in tf.keras.... enabled me quite easy to provide the data in the right shape to an LSTM model

from keras.preprocessing.sequence import TimeseriesGenerator
import numpy as np

window_size   = 7
batch_size    = 8
sampling_rate = 1

train_gen = TimeseriesGenerator(X_train.values, y_train.values,
                               length=window_size, sampling_rate=sampling_rate,
                               batch_size=batch_size)

valid_gen = TimeseriesGenerator(X_valid.values, y_valid.values,
                               length=window_size, sampling_rate=sampling_rate,
                               batch_size=batch_size)
test_gen  = TimeseriesGenerator(X_test.values, y_test.values,
                               length=window_size, sampling_rate=sampling_rate,
                               batch_size=batch_size)

There are many other ways on implementing generators e.g. using the more_itertools which provides the function windowed, or making use of tensorflow.Dataset and its function window. For me the TimeseriesGenerator was sufficient to feed the tests I did. In case you would like to see an example modeling the DAX based on some stocks I'm sharing a notebook on Github.

Upvotes: 0

mujjiga
mujjiga

Reputation: 16916

Lets looks at the few popular ways in LSTMs are used.

Many to Many

Example: You have a sentence (composed of words in sequence). Give these sequence of words you would like to predict the Parts of speech (POS) of each word.

enter image description here

So you have n words and you feed each word per timestep to the LSTM. Each LSTM timestep (also called LSTM unwrapping) will produce and output. The word is represented by a a set of features normally word embeddings. So the input to LSTM is of size bath_size X time_steps X features

Keras code:

inputs = keras.Input(shape=(10,3))
lstm = keras.layers.LSTM(8, input_shape = (10, 3), return_sequences = True)(inputs)
outputs = keras.layers.TimeDistributed(keras.layers.Dense(5, activation='softmax'))(lstm)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(loss='categorical_crossentropy', optimizer='adam')

X = np.random.randn(4,10,3) 
y = np.random.randint(0,2, size=(4,10,5))

model.fit(X, y, epochs=2)
print (model.predict(X).shape)

Many to One

Example: Again you have a sentence (composed of words in sequence). Give these sequence of words you would like to predict sentiment of the sentence if it is positive or negative.

enter image description here

Keras code

inputs = keras.Input(shape=(10,3))
lstm = keras.layers.LSTM(8, input_shape = (10, 3), return_sequences = False)(inputs)
outputs =keras.layers.Dense(5, activation='softmax')(lstm)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(loss='categorical_crossentropy', optimizer='adam')

X = np.random.randn(4,10,3) 
y = np.random.randint(0,2, size=(4,5))

model.fit(X, y, epochs=2)
print (model.predict(X).shape)

Many to multi-headed

Example: You have a sentence (composed of words in sequence). Give these sequence of words you would like to predict sentiment of the sentence as well the author of the sentence.

This is multi-headed model where one head will predict the sentiment and another head will predict the author. Both the heads share the same LSTM backbone.

enter image description here

Keras code

inputs = keras.Input(shape=(10,3))
lstm = keras.layers.LSTM(8, input_shape = (10, 3), return_sequences = False)(inputs)
output_A = keras.layers.Dense(5, activation='softmax')(lstm)
output_B = keras.layers.Dense(5, activation='softmax')(lstm)

model = keras.Model(inputs=inputs, outputs=[output_A, output_B])
model.compile(loss='categorical_crossentropy', optimizer='adam')

X = np.random.randn(4,10,3) 
y_A = np.random.randint(0,2, size=(4,5))
y_B = np.random.randint(0,2, size=(4,5))

model.fit(X, [y_A, y_B], epochs=2)
y_hat_A, y_hat_B = model.predict(X)
print (y_hat_A.shape, y_hat_B.shape)

What you are looking for is Many to Multi head model where your predictions for A will be made by one head and another head will make predictions for B

Upvotes: 1

Marcus
Marcus

Reputation: 339

The input data for the LSTM has to be 3D.

If you print the shapes of your DataFrames you get:

targets : (300, 2)
features : (300, 300)

The input data has to be reshaped into (samples, time steps, features). This means that targets and features must have the same shape.

You need to set a number of time steps for your problem, in other words, how many samples will be used to make a prediction.

For example, if you have 300 days and 2 features the time step can be 3. So that three days will be used to make one prediction (you can choose this arbitrarily). Here is the code for reshaping your data (with a few more changes):

import pandas as pd
import numpy as np
import tensorflow as tf
import random
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

data = pd.DataFrame(index=pd.date_range(start='2019-01-01', periods=300, freq='D'))
data['A'] = [random.random() for _ in range(len(data))]
data['B'] = [random.random() for _ in range(len(data))]

# Choose the time_step size.
time_steps = 3
# Use numpy for the 3D array as it is easier to handle.
data = np.array(data)

def make_x_y(ts, data):
    """
    Parameters
    ts : int
    data : numpy array

    This function creates two arrays, x and y. 
    x is the input data and y is the target data.
    """
    x, y = [], []
    offset = 0
    for i in data:
        if offset < len(data)-ts:
            x.append(data[offset:ts+offset])
            y.append(data[ts+offset])
            offset += 1
    return np.array(x), np.array(y)

x, y = make_x_y(time_steps, data)

print(x.shape, y.shape)

nodes = 100  # This is the width of the network.
out_size = 2  # Number of outputs produced by the network. Same size as features.

model = Sequential()
model.add(LSTM(units=nodes, input_shape=(x.shape[1], x.shape[2])))
model.add(Dense(out_size))  # For the output a Dense (fully connected) layer is used.
model.compile(optimizer='adam', loss='mae')
model.fit(x, y, batch_size=10, epochs=10)

Upvotes: 0

Related Questions