Reputation: 315
I currently have a DNN I trained that makes a prediction of a one-hot encoded classification for states that a game is in. Essentially, imagine there are three states, 0, 1, or 2.
Now, I normally would use categorical_cross_entropy
for the loss function, but I realized not all classifications are not equal for my states. For example:
I know that we can declare our custom loss functions in Keras but I keep on getting stuck with forming it. Anyone have suggestions how to transform that pseudo code? I can't tell how I'd do that in a vector-wise operation.
Additional question: I essentially am after a reward function I think. Is this the same as a loss function? Thanks!
def custom_expectancy(y_expected, y_pred):
# Get 0, 1 or 2
expected_norm = tf.argmax(y_expected);
predicted_norm = tf.argmax(y_pred);
# Some pseudo code....
# Now, if predicted == 1
# loss += 0
# elif predicted == expected
# loss -= 3
# elif predicted != expected
# loss += 1
#
# return loss
Sources consulted:
Custom loss in Keras with softmax to one-hot
Code Update
import tensorflow as tf
def custom_expectancy(y_expected, y_pred):
# Get 0, 1 or 2
expected_norm = tf.argmax(y_expected);
predicted_norm = tf.argmax(y_pred);
results = tf.unstack(expected_norm)
# Some pseudo code....
# Now, if predicted == 1
# loss += 0
# elif predicted == expected
# loss += 3
# elif predicted != expected
# loss -= 1
for idx in range(0, len(expected_norm)):
predicted = predicted_norm[idx]
expected = expected_norm[idx]
if predicted == 1: # do nothing
results[idx] = 0.0
elif predicted == expected: # reward
results[idx] = 3.0
else: # wrong, so we lost
results[idx] = -1.0
return tf.stack(results)
I think this is what I'm after, but I haven't quite figured out how to build the correct tensor (which should be of size batch) to return.
Upvotes: 2
Views: 760
Reputation: 1941
Here's how you want to do it. If your ground truth y_true is dense (shaped N3), you can use a tf.reduce_all(y_true == [0.0, 0.0, 1.0], axis=-1, keepdims=True)
and tf.reduce_all(y_true == [1.0, 0.0, 0.0], axis=-1, keepdims=True)
to control the if/elif/else. You could further optimize this with a tf.gather.
def sparse_loss(y_true, y_pred):
"""Calculate loss for game. Follows keras loss signature.
Args:
y_true: Sparse tensor of shape N1, where correct prediction
is encoded as 0, 1, or 2.
y_pred: Tensor of shape N3. For each row, the three columns
represent the predicted probability of each state.
For example, [0.1, 0.4, 0.6] means, "There's a 10% chance the
right state is 0; 40% chance the right state is 1,
and 60% chance the right state is 2".
"""
# This is the unvectorized implementation on individual rows which is more
# intuitive. But TF requires vectorization.
# if y_true == 0:
# # Value matrix is shape 3. Broadcasting will occur.
# return -tf.reduce_sum(y_pred * [3.0, 0.0, -1.0])
# elif y_true == 2:
# return -tf.reduce_sum(y_pred * [-1.0, 0.0, 3.0])
# else:
# # According to the rules, this is never the correct
# # state the predict so it should never show up.
# assert False, f'Impossible state reached. y_true: {y_true}, y_pred: {y_pred}.'
# We vectorize by calculating the reward for all predictions for two cases:
# if y_true is zero or if y_true is two. To eliminate this inefficiency, we
# could us tf.gather to build an N3 shaped matrix to multiply against.
reward_for_true_zero = tf.reduce_sum(y_pred * [3.0, 0.0, -1.0], axis=-1, keepdims=True) # N1
reward_for_true_two = tf.reduce_sum(y_pred * [-1.0 ,0.0, 3.0], axis=-1, keepdims=True) # N1
reward = tf.where(y_true == 0.0, reward_for_true_zero, reward_for_true_one) # N1
return -tf.reduce_sum(reward)
Upvotes: 0
Reputation: 22031
The best way to build a conditional custom loss is to use tf.keras.backend.switch
without involving loops.
In your case, you should combine 2 switch conditional expressions to obtain the desired results.
The desired loss function can be reproduced in this way:
def custom_expectancy(y_expected, y_pred):
zeros = tf.cast(tf.reduce_sum(y_pred*0, axis=-1), tf.float32) ### important to produce gradient
y_expected = tf.cast(tf.reshape(y_expected, (-1,)), tf.float32)
class_pred = tf.argmax(y_pred, axis=-1)
class_pred = tf.cast(class_pred, tf.float32)
cond1 = (class_pred != y_expected) & (class_pred != 1)
cond2 = (class_pred == y_expected) & (class_pred != 1)
res1 = tf.keras.backend.switch(cond1, zeros -1, zeros)
res2 = tf.keras.backend.switch(cond2, zeros +3, zeros)
return res1 + res2
Where cond1
is when the model incorrectly predicts states 0 or 2 and cond2
is when the model correctly predicts states 0 or 2. The standard states is zero that is returned when cond1
and cond2
are not activated.
You can notice that y_expected
can be passed as a simple tensor/array of integer encoded states (no need to one-hot them).
Here how the loss function works:
true = tf.constant([[1], [2], [1], [0] ]) ## no need to one-hot
pred = tf.constant([[0,1,0],[0,0,1],[0,0,1],[0,1,0]])
custom_expectancy(true, pred)
Which returns:
<tf.Tensor: shape=(4,), dtype=float32, numpy=array([ 0., 3., -1., 0.], dtype=float32)>
That seems to be consistent with our needs.
To use the loss inside a model:
X = np.random.uniform(0,1, (1000,10))
y = np.random.randint(0,3, (1000)) ## no need to one-hot
model = Sequential([Dense(3, activation='softmax')])
model.compile(optimizer='adam', loss=custom_expectancy)
model.fit(X,y, epochs=3)
Here the running notebook
Upvotes: 1
Reputation: 1304
Here there is a nice post explaining the concepts of the loss function and cost function. Multiple answers illustrate how they are considered by different authors in the field of machine learning.
As for the loss function, you may find the following implementation useful. It implements a weighted cross-entropy loss, where you weigh each class proportionally to their weight in train. This could be adapted to satisfy the constraints specified above.
Upvotes: 1