DARK_DUCK
DARK_DUCK

Reputation: 1777

Keras component selection in loss calculation

I have this simple model (5 channels) and I expect it to return the second one

import keras
import numpy as np
import keras.backend as K

data = np.random.normal(size = (1000, 5))

model = keras.models.Sequential()
model.add(keras.layers.Dense(10, activation = 'linear',input_shape = (5,)))
model.add(keras.layers.Dense(1, activation = 'linear'))

def loss(x, y):
    return K.mean(K.square(x - y))

model.compile('adam', loss)
model.fit(data, data[:, 1], epochs = 100)

It works great and I get perfect zero loss.

When I tweak it a little bit (I add an extra channel in the output) and I decide that I don't care about the second one.

I change it to be this:

import keras
import numpy as np
import keras.backend as K

data = np.random.normal(size = (1000, 5))

model = keras.models.Sequential()
model.add(keras.layers.Dense(10, activation = 'linear',input_shape = (5,)))
model.add(keras.layers.Dense(2, activation = 'linear'))

def loss(x, y):
    return K.mean(K.square(x - y[:, 0]))

model.compile('adam', loss)
model.fit(data, data[:, 1], epochs = 100)

And now it is impossible to train. It seems crazy to me. Does anyone know what is happening ?

PS: This example might seem stupid but for a more complex problem I need to compute a custom loss and I reduced the problem to this simple example.

Thank you for your help

Upvotes: 2

Views: 300

Answers (1)

DARK_DUCK
DARK_DUCK

Reputation: 1777

After hours of struggling I finally have a fix (and a potential explanation).

The problem in this example (and the only difference) is the index selection. Even though it seems supported by Tensorflow. It does not behave correctly. (And the problematic snippet fails under Theano Backend). Even though the loss is computed correctly it seems the derivative is wrong. Misleading the optimizer. This is why the NN does not train. A hacky but perfectly working solution I found is to replace

y[:, 0]

by

tensorflow.matmul(y, [[1.0], [0.0]])

I did not try but it should be fine with keras.backend.dot too if you are looking for multi-backend stuff. Be careful to put float and not integers in the weights otherwise it will not typecheck.

Hope it will help someone else.

Upvotes: 2

Related Questions