j sad
j sad

Reputation: 1095

How do I mask multi-output in Tensorflow 2 LSTM training?

I am training an LSTM model in Tensorflow 2 to predict two outputs, streamflow and water temperature.

So the loss function needs to ignore the temperature and streamflow loss when they don't have a label. I've done quite a bit of reading in the TF docs, but I'm struggling to figure out how to best do this.

So far I've tried

When I do this, I get an error asking me to pass a 2D array. But that confuses me because there are 3 dimensions: n_samples, sequence_length, and n_outputs.

Here's some code of what I am basically trying to do:

import tensorflow as tf
import numpy as np

# set up the model
simple_lstm_model = tf.keras.models.Sequential([
    tf.keras.layers.LSTM(8, return_sequences=True),
    tf.keras.layers.Dense(2)
])

simple_lstm_model.compile(optimizer='adam', loss='mae',
                          sample_weight_mode='temporal')

n_sample = 2
seq_len = 10
n_feat = 5
n_out = 2

# random in/out
x = np.random.randn(n_sample, seq_len, n_feat)
y_true = np.random.randn(n_sample, seq_len, n_out)

# set the initial mask as all ones (everything counts equally)
mask = np.ones([n_sample, seq_len, n_out])
# set the mask so that in the 0th sample, in the 3-8th time step
# the 1th variable is not counted in the loss function
mask[0, 3:8, 1] = 0

simple_lstm_model.fit(x, y_true, sample_weight=mask)

The error:

ValueError: Found a sample_weight array with shape (2, 10, 2). In order to use timestep-wise sample weighting, you should
pass a 2D sample_weight array.

Any ideas? I must not understand what sample_weights do because to me it only makes sense if the sample_weight array has the same dimensions as the output. I could write a custom loss function and handle the masking manually, but it seems like there should be a more general or built in solution.

Upvotes: 2

Views: 825

Answers (1)

Szymon Maszke
Szymon Maszke

Reputation: 24691

1. sample_weights

Yes, you understand it incorrectly. In this case you have 2 samples, 10 timesteps with 5 features each. You could pass 2D tensor like this so each timestep for each sample will contribute differently to total loss, all features are equally weighted (as is usually the case).

That's not what you are after at all. You would like to mask certain loss values after their calculation so those do not contribute.

2. Custom loss

One possible solution is to implement your own loss function which multiplies loss tensor by mask before taking mean or sum.

Basically, you pass mask and tensor concatenated together somehow and split it within function for use. This is sufficient:

def my_loss_function(y_true_mask, y_pred):
    # Recover y and mask
    y_true, mask = tf.split(y_true_mask, 2)
    # You could user reduce_sum or other combinations
    return tf.math.reduce_mean(tf.math.abs(y_true - y_pred) * mask)

Now your code (no weighting as it's not needed):

simple_lstm_model = tf.keras.models.Sequential(
    [tf.keras.layers.LSTM(8, return_sequences=True), tf.keras.layers.Dense(2)]
)

simple_lstm_model.compile(optimizer="adam", loss=my_loss_function)

n_sample = 2
seq_len = 10
n_feat = 5
n_out = 2

x = np.random.randn(n_sample, seq_len, n_feat)
y_true = np.random.randn(n_sample, seq_len, n_out)

mask = np.ones([n_sample, seq_len, n_out])
mask[0, 3:8, 1] = 0

# Stack y and mask together
y_true_mask = np.stack([y_true, mask])

simple_lstm_model.fit(x, y_true_mask)

And so it works. You could also stack the values in some other way, but I hope you get the feel of how one could do it.

3. Masking outputs

Please notice above introduces a few problems. If you have a lot of zeroes and take mean you might get a really small loss value and inhibit learning. On the other hand, if you go with sum it might explode.

Upvotes: 5

Related Questions