Reputation: 67
I would like to write a custom loss function for a seq2seq problem. My input (X) has shape (N, M), that is, N sequences of length M each. Each sequence has M/2 numbers (from 1 to M/2), repeated twice and randomly. Here, is an example with M=200:
X = array([[ 60., 71., 15., ..., 73., 64., 71.],
[ 71., 37., 19., ..., 78., 34., 65.],
[ 50., 41., 91., ..., 57., 59., 4.],
...,
[ 2., 66., 79., ..., 25., 66., 13.],
[ 16., 25., 11., ..., 83., 74., 38.],
[ 73., 100., 91., ..., 48., 61., 51.]])
y = array([[1., 1., 1., ..., 0., 0., 0.],
[1., 1., 1., ..., 1., 1., 1.],
[0., 0., 0., ..., 1., 1., 1.],
...,
[0., 0., 0., ..., 0., 1., 1.],
[1., 1., 1., ..., 0., 1., 1.],
[0., 0., 0., ..., 1., 1., 1.]])
I reshape them to
X_ = X.reshape(X.shape[0],1,X.shape[1])
y_ = y.reshape(y.shape[0],1,y.shape[1])
I would like that the loss is calculated based on the number of times there is a change in the y_pred (and y) sequences. For instance, if my output is y_pred = [ 1, 0, 1, 1, 1, 0, 0, 0, 1, 1 ], the number of times there is a change from 0 to 1 (or viceversa) is 4.
Here is my network:
model = Sequential()
model.add(LSTM(400,input_shape =(1,X_.shape[2]), activation='relu',return_sequences=True))
model.add(LSTM(350,activation='relu',return_sequences=False))
model.add(Dense(200, activation='softmax'))
model.compile(loss=my_loss_fn, optimizer='Adam')
And this is the loss function I tried to write:
def my_loss_fn(y, y_pred):
import tensorflow as tf
c1 = tf.math.count_nonzero(tf.experimental.numpy.diff(y)!=0)
c2 = tf.math.count_nonzero(tf.experimental.numpy.diff(y_pred)!=0)
return tf.math.subtract(c1, c2)
The problem is that I get this error when I fit the model:
ValueError: No gradients provided for any variable
This happens most probably because numpy.diff is not differentiable as pointed out here (Numpy or SciPy Derivative function for non-uniform spacing?) and here (https://discuss.pytorch.org/t/differentiable-version-of-numpy-diff/89347/4).
How could I create a differentiable version of my function?
Upvotes: 0
Views: 364
Reputation: 67
The problem was that I needed to use differentiable operations. I found a list of differentiable operations in Tensorflow (https://www.tensorflow.org/api_docs/python/tf/raw_ops) and modified the custom loss function accordingly:
def loss_fn(y_true, y_):
a1 = tf.roll(y_true, shift=1, axis=1)
c1 = tf.subtract(a1, y_true)
a2 = tf.roll(y_, shift=1, axis=1)
c2 = tf.subtract(a1, y_)
return tf.math.reduce_mean(tf.square(c1 - c2))
Upvotes: 1