Shlomi Schwartz
Shlomi Schwartz

Reputation: 8903

LSTM Auto Encoder, use first LSTM output as the target for the decoder

Having a sequence of 10 days of sensors events, and a true / false label, specifying if the sensor triggered an alert within the 10 days duration:

sensor_id timestamp feature_1 feature_2 10_days_alert_label
1 2020-12-20 01:00:34.565 0.23 0.1 1
1 2020-12-20 01:03:13.897 0.3 0.12 1
2 2020-12-20 01:00:34.565 0.13 0.4 0
2 2020-12-20 01:03:13.897 0.2 0.9 0

95% of the sensors do not trigger an alert, therefore the data is imbalanced. I was thinking of an autoEncoder model in order to detect the anomalies (Sensors that triggered an alarm). Since I'm not interested in decoding the entire sequence, just the LSTM learned context vector, I was thinking of something like the figure below, where the decoder is reconstructing the encoder output:

enter image description here

I've googled around and found this simple LSTM auto encoder example:

# lstm autoencoder recreate sequence
from numpy import array
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import RepeatVector
from tensorflow.keras.layers import TimeDistributed
from tensorflow.keras.utils import plot_model
# define input sequence
sequence = array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
# reshape input into [samples, timesteps, features]
n_in = len(sequence)
sequence = sequence.reshape((1, n_in, 1))
# define model
model = Sequential()
model.add(LSTM(100, activation='relu', input_shape=(n_in,1)))
model.add(RepeatVector(n_in))
model.add(LSTM(100, activation='relu', return_sequences=True))
model.add(TimeDistributed(Dense(1)))
model.compile(optimizer='adam', loss='mse')
# fit model
model.fit(sequence, sequence, epochs=300, verbose=0)
plot_model(model, show_shapes=True, to_file='reconstruct_lstm_autoencoder.png')
# demonstrate recreation
yhat = model.predict(sequence, verbose=0)
print(yhat[0,:,0])

I would like to modify the above example so the first LSTM output is used as the decoder target. Something like:

# lstm autoencoder recreate sequence
from numpy import array
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import RepeatVector
from tensorflow.keras.layers import TimeDistributed
from tensorflow.keras.utils import plot_model
# define input sequence
sequence = array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
# reshape input into [samples, timesteps, features]
n_in = len(sequence)
sequence = sequence.reshape((1, n_in, 1))
# define model
model = Sequential()
model.add(LSTM(100, activation='relu', input_shape=(n_in,1)))

model.add(Dense(100, activation='relu')) # First LSTM output
model.add(Dense(32, activation='relu')) # Bottleneck 
model.add(Dense(100, activation='sigmoid')) # Decoded vector

model.compile(optimizer='adam', loss='mse')

# fit model
model.fit(sequence, FIRST_LSTM_OUTPUT, epochs=300, verbose=0) # <--- ???

Q: Can I use the first LSTM output vector as a target?

Upvotes: 1

Views: 636

Answers (2)

Innat
Innat

Reputation: 17219

What you need is to find a method by which you can compute the loss tensor of intermediate layers or output of the hidden activation functions (e.g first LSTM and last Dense in your case). In we can do that using the add_loss() method. It potentially depends on the layer inputs (tensor). However, you can read the purpose of this add_loss from my answer here in great detail. You can also check this discussion thread regarding this issue.

# data 
sequence = array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9])
n_in = len(sequence)
sequence = sequence.reshape((1, n_in, 1))

# layers 
inputs = tf.keras.Input(shape=(n_in,1))
lx = LSTM(100, activation=tf.nn.relu)(inputs) # first lstm 
x = Dense(100, activation=tf.nn.relu)(lx)
x = Dense(32, activation=tf.nn.relu)(x)
outputs = Dense(100, activation=tf.nn.sigmoid)(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)

# add_loss - compute mse of 
# model's last layer with output of first lstm 
model.add_loss(tf.keras.metrics.mean_squared_error(outputs, lx))
model.compile(optimizer='adam') 

# fit model
# no need to pass first lstm, we added it already using add_loss method 
model.fit(sequence, epochs=3, verbose=1) 

Epoch 1/3
1/1 [==============================] - 1s 1s/step - loss: 0.2270
Epoch 2/3
1/1 [==============================] - 0s 18ms/step - loss: 0.2250
Epoch 3/3
1/1 [==============================] - 0s 15ms/step - loss: 0.2227
model.predict(sequence).shape
(1, 100)

Upvotes: 1

Marco Cerliani
Marco Cerliani

Reputation: 22021

You can do it using model.add_loss. In add_loss we specify the loss of our interest (in our case: mse) and set the layers used to compute it (in our case: the LSTM output and model predictions)

Below a dummy example:

n_sample, timesteps = 100, 9
X = np.random.uniform(0,1, (100, 9, 1))

def mse(enc_output, pred):
    return  tf.reduce_mean(tf.square(enc_output - pred))
    
inp = Input((timesteps,1,))
enc = LSTM(100, activation='relu')(inp)
x = Dense(100, activation='relu')(enc)
x = Dense(32, activation='relu')(x)
out = Dense(100, activation='sigmoid')(x)
model = Model(inp, out)

model.add_loss(mse(enc, out))
model.compile(optimizer='adam', loss=None)
model.fit(X, y=None, epochs=3)

Here the running code

Upvotes: 2

Related Questions