King5ters
King5ters

Reputation: 31

Costumizing loss function in keras with condition

I want to setup a keras model (tensorflow backend) for a multiclassification problem with 4 different classes. I have both labeled and unlabeled data.

I have worked out the case in which I only train with the labeled data and my model looks something like this:

# create model
inputs = keras.Input(shape=(len(config.variables), ))  
X = layers.Dense(units=200, activation="relu")(inputs)
output = layers.Dense(units=4, activation="softmax", name="output")(X)

model = keras.Model(inputs=inputs, outputs=output)
model.compile(optimizer=optimizers.Adam(1e-4), loss=loss_function, metrics=["accuracy"])

# train model
model.fit(
    x=train_data, 
    y=train_class_labels,
    batch_size=200, 
    epochs=200, 
    verbose=2, 
    validation_split=0.2,
    sample_weight       = class_weights
    )

I have functioning models with to different losses namely categorical_crossentropy and sparse_categorical_crossentropy, and depending on the loss function my train_class_labels where in one-hot representation (e.g. [ [0,1,0,0], [0,0,0,1], ...]) or in the integer representation (e.g. [0,0,2,1,0,3, ...]) and everything worked fine. class_weights is some weight vector ([0.78, 1,34, ...])

Now for my further plans I need to include the unlabeled data in the training process but I need it to be ignored by the loss function.

What I have tried:

  1. setting the labels from the unlabeled data to [0,0,0,0] when using categorical_crossentropy as a loss, because i thought then my unlabeled data would be ignored by the loss function. Somehow this changed the predictions after training.
  2. I also tried setting the weights from the unlabeled data to 0 but that did have an effect either

I concluded that I need to somehow mark me unlabeled data and customize my loss function so that it can be told to ignore those samples. Something like

def custom_loss(y_true, y_pred):
    if y_true == labeled data:
        return normal loss function
    if y_true == unlabeled data:
        return 0

Those are some snippets that I have found but they do not seem to work:

def custom_loss(y_true, y_pred):
    loss = losses.sparse_categorical_crossentropy(y_true, y_pred)
    return K.switch(K.flatten(K.equal(y_true, -1)), K.zeros_like(loss), loss)

def custom_loss2(y_true, y_pred):
    idx  = tf.not_equal(y_true, -1)
    y_true = tf.boolean_mask(y_true, idx)
    y_pred = tf.boolean_mask(y_pred, idx)
    return losses.sparse_categorical_crossentropy(y_true, y_pred)

In those examples I set the labels from the unlabeled data to -1 so train_class_labels would look something like this: [0,-1,2,0,3, ... ]

But when using the first loss function I just get Nans and when using the second one I get the following error: Invalid argument: logits and labels must have the same first dimension, got logits shape [1,5000] and labels shape [5000]

Upvotes: 0

Views: 575

Answers (1)

Nikaido
Nikaido

Reputation: 4629

I think that setting the labels to [0,0,0,0] would be just fine. Because the loss is calculated by sum of the log losses of your instances per class (in your case the loss would be 0 for instances with no label).

I don't understand why you are inserting non labeled data in your training in a supervised setting.

I think that the differences that you obtain are due to the batch size and to the gradient step. If there are instances that do not contribute to the gradient descent, the loss calculated would be different than before, and then you get the difference in prediction.

Basically there would be less informative instances per batch.

If you use as batch size the size of all the dataset there would be no difference from a previous training without the unlabeled instances (but always with a training with batch size = size of the dataset)

Upvotes: 1

Related Questions