Deftness
Deftness

Reputation: 315

Keras Custom Loss for One-Hot Encoded

I currently have a DNN I trained that makes a prediction of a one-hot encoded classification for states that a game is in. Essentially, imagine there are three states, 0, 1, or 2.

Now, I normally would use categorical_cross_entropy for the loss function, but I realized not all classifications are not equal for my states. For example:

I know that we can declare our custom loss functions in Keras but I keep on getting stuck with forming it. Anyone have suggestions how to transform that pseudo code? I can't tell how I'd do that in a vector-wise operation.

Additional question: I essentially am after a reward function I think. Is this the same as a loss function? Thanks!

def custom_expectancy(y_expected, y_pred):
    
    # Get 0, 1 or 2
    expected_norm = tf.argmax(y_expected);
    predicted_norm = tf.argmax(y_pred);
    
    # Some pseudo code....
    # Now, if predicted == 1
    #     loss += 0
    # elif predicted == expected
    #     loss -= 3
    # elif predicted != expected
    #     loss += 1
    #
    # return loss

Sources consulted:

https://datascience.stackexchange.com/questions/55215/how-do-i-create-a-keras-custom-loss-function-for-a-one-hot-encoded-binary-classi

Custom loss in Keras with softmax to one-hot

Code Update

import tensorflow as tf
def custom_expectancy(y_expected, y_pred):
    
    # Get 0, 1 or 2
    expected_norm = tf.argmax(y_expected);
    predicted_norm = tf.argmax(y_pred);
    
    results = tf.unstack(expected_norm)
    
    # Some pseudo code....
    # Now, if predicted == 1
    #     loss += 0
    # elif predicted == expected
    #     loss += 3
    # elif predicted != expected
    #     loss -= 1
    
    for idx in range(0, len(expected_norm)):
        predicted = predicted_norm[idx]
        expected = expected_norm[idx]
        
        if predicted == 1: # do nothing
            results[idx] = 0.0
        elif predicted == expected: # reward
            results[idx] = 3.0
        else: # wrong, so we lost
            results[idx] = -1.0
    
    
    return tf.stack(results)

I think this is what I'm after, but I haven't quite figured out how to build the correct tensor (which should be of size batch) to return.

Upvotes: 2

Views: 760

Answers (3)

Yaoshiang
Yaoshiang

Reputation: 1941

Here's how you want to do it. If your ground truth y_true is dense (shaped N3), you can use a tf.reduce_all(y_true == [0.0, 0.0, 1.0], axis=-1, keepdims=True) and tf.reduce_all(y_true == [1.0, 0.0, 0.0], axis=-1, keepdims=True) to control the if/elif/else. You could further optimize this with a tf.gather.

def sparse_loss(y_true, y_pred):
  """Calculate loss for game. Follows keras loss signature.
  
  Args:
    y_true: Sparse tensor of shape N1, where correct prediction
      is encoded as 0, 1, or 2. 
    y_pred: Tensor of shape N3. For each row, the three columns
      represent the predicted probability of each state. 
      For example, [0.1, 0.4, 0.6] means, "There's a 10% chance the 
      right state is 0; 40% chance the right state is 1, 
      and 60% chance the right state is 2". 
  """

  # This is the unvectorized implementation on individual rows which is more
  # intuitive. But TF requires vectorization. 
  # if y_true == 0:
  #   # Value matrix is shape 3. Broadcasting will occur. 
  #   return -tf.reduce_sum(y_pred * [3.0, 0.0, -1.0])
  # elif y_true == 2:
  #   return -tf.reduce_sum(y_pred * [-1.0, 0.0, 3.0])
  # else:
  #   # According to the rules, this is never the correct
  #   # state the predict so it should never show up.
  #   assert False, f'Impossible state reached. y_true: {y_true}, y_pred: {y_pred}.'


  # We vectorize by calculating the reward for all predictions for two cases:
  # if y_true is zero or if y_true is two. To eliminate this inefficiency, we 
  # could us tf.gather to build an N3 shaped matrix to multiply against. 
  reward_for_true_zero = tf.reduce_sum(y_pred * [3.0, 0.0, -1.0], axis=-1, keepdims=True) # N1
  reward_for_true_two = tf.reduce_sum(y_pred * [-1.0 ,0.0, 3.0], axis=-1, keepdims=True) # N1

  reward = tf.where(y_true == 0.0, reward_for_true_zero, reward_for_true_one) # N1
  return -tf.reduce_sum(reward)

Upvotes: 0

Marco Cerliani
Marco Cerliani

Reputation: 22031

The best way to build a conditional custom loss is to use tf.keras.backend.switch without involving loops.

In your case, you should combine 2 switch conditional expressions to obtain the desired results.

The desired loss function can be reproduced in this way:

def custom_expectancy(y_expected, y_pred):
    
    zeros = tf.cast(tf.reduce_sum(y_pred*0, axis=-1), tf.float32) ### important to produce gradient
    y_expected = tf.cast(tf.reshape(y_expected, (-1,)), tf.float32)
    class_pred = tf.argmax(y_pred, axis=-1)
    class_pred = tf.cast(class_pred, tf.float32)
    
    cond1 = (class_pred != y_expected) & (class_pred != 1)
    cond2 = (class_pred == y_expected) & (class_pred != 1)
    
    res1 = tf.keras.backend.switch(cond1, zeros -1, zeros)
    res2 = tf.keras.backend.switch(cond2, zeros +3, zeros)
    
    return res1 + res2

Where cond1 is when the model incorrectly predicts states 0 or 2 and cond2 is when the model correctly predicts states 0 or 2. The standard states is zero that is returned when cond1 and cond2 are not activated.

You can notice that y_expected can be passed as a simple tensor/array of integer encoded states (no need to one-hot them).

Here how the loss function works:

true = tf.constant([[1],    [2],    [1],    [0]    ])  ## no need to one-hot
pred = tf.constant([[0,1,0],[0,0,1],[0,0,1],[0,1,0]])

custom_expectancy(true, pred)

Which returns:

<tf.Tensor: shape=(4,), dtype=float32, numpy=array([ 0.,  3., -1.,  0.], dtype=float32)>

That seems to be consistent with our needs.

To use the loss inside a model:

X = np.random.uniform(0,1, (1000,10))
y = np.random.randint(0,3, (1000)) ## no need to one-hot

model = Sequential([Dense(3, activation='softmax')])
model.compile(optimizer='adam', loss=custom_expectancy)
model.fit(X,y, epochs=3)

Here the running notebook

Upvotes: 1

sashimi
sashimi

Reputation: 1304

Here there is a nice post explaining the concepts of the loss function and cost function. Multiple answers illustrate how they are considered by different authors in the field of machine learning.

As for the loss function, you may find the following implementation useful. It implements a weighted cross-entropy loss, where you weigh each class proportionally to their weight in train. This could be adapted to satisfy the constraints specified above.

Upvotes: 1

Related Questions