Reputation: 1095
I am training an LSTM model in Tensorflow 2 to predict two outputs, streamflow and water temperature.
So the loss function needs to ignore the temperature and streamflow loss when they don't have a label. I've done quite a bit of reading in the TF docs, but I'm struggling to figure out how to best do this.
So far I've tried
sample_weight_mode='temporal'
when compiling the model and then included a sample_weight
numpy array when calling fit
When I do this, I get an error asking me to pass a 2D array. But that confuses me because there are 3 dimensions: n_samples
, sequence_length
, and n_outputs
.
Here's some code of what I am basically trying to do:
import tensorflow as tf
import numpy as np
# set up the model
simple_lstm_model = tf.keras.models.Sequential([
tf.keras.layers.LSTM(8, return_sequences=True),
tf.keras.layers.Dense(2)
])
simple_lstm_model.compile(optimizer='adam', loss='mae',
sample_weight_mode='temporal')
n_sample = 2
seq_len = 10
n_feat = 5
n_out = 2
# random in/out
x = np.random.randn(n_sample, seq_len, n_feat)
y_true = np.random.randn(n_sample, seq_len, n_out)
# set the initial mask as all ones (everything counts equally)
mask = np.ones([n_sample, seq_len, n_out])
# set the mask so that in the 0th sample, in the 3-8th time step
# the 1th variable is not counted in the loss function
mask[0, 3:8, 1] = 0
simple_lstm_model.fit(x, y_true, sample_weight=mask)
The error:
ValueError: Found a sample_weight array with shape (2, 10, 2). In order to use timestep-wise sample weighting, you should
pass a 2D sample_weight array.
Any ideas? I must not understand what sample_weights
do because to me it only makes sense if the sample_weight
array has the same dimensions as the output. I could write a custom loss function and handle the masking manually, but it seems like there should be a more general or built in solution.
Upvotes: 2
Views: 825
Reputation: 24691
sample_weights
Yes, you understand it incorrectly. In this case you have 2
samples, 10
timesteps with 5
features each. You could pass 2D
tensor like this so each timestep for each sample will contribute differently to total loss, all features are equally weighted (as is usually the case).
That's not what you are after at all. You would like to mask certain loss values after their calculation so those do not contribute.
One possible solution is to implement your own loss function which multiplies loss tensor by mask before taking mean
or sum
.
Basically, you pass mask
and tensor
concatenated together somehow and split it within function for use. This is sufficient:
def my_loss_function(y_true_mask, y_pred):
# Recover y and mask
y_true, mask = tf.split(y_true_mask, 2)
# You could user reduce_sum or other combinations
return tf.math.reduce_mean(tf.math.abs(y_true - y_pred) * mask)
Now your code (no weighting as it's not needed):
simple_lstm_model = tf.keras.models.Sequential(
[tf.keras.layers.LSTM(8, return_sequences=True), tf.keras.layers.Dense(2)]
)
simple_lstm_model.compile(optimizer="adam", loss=my_loss_function)
n_sample = 2
seq_len = 10
n_feat = 5
n_out = 2
x = np.random.randn(n_sample, seq_len, n_feat)
y_true = np.random.randn(n_sample, seq_len, n_out)
mask = np.ones([n_sample, seq_len, n_out])
mask[0, 3:8, 1] = 0
# Stack y and mask together
y_true_mask = np.stack([y_true, mask])
simple_lstm_model.fit(x, y_true_mask)
And so it works. You could also stack the values in some other way, but I hope you get the feel of how one could do it.
Please notice above introduces a few problems. If you have a lot of zeroes and take mean
you might get a really small loss value and inhibit learning. On the other hand, if you go with sum
it might explode.
Upvotes: 5