pythontensorflowkerastensorflow2.0tf.keras

Reputation: 48725

Tensorflow 2.0 (Keras) classification with restricted classes

Problem background

I have a basic classification problem, classifying each row into one of 20 classes.

However, there is a twist. For every row, only some of those 20 classes are valid - and this is known upfront.

In tensorflow 1.0, I have been nullifying the logits of the impossible classes. The only modfication is the loss function:

def getLoss(logits, y, restrictions):
    logits = tf.where(restrictions, -1000.0 * tf.ones_like(y), logits)
    return tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=y)

loss = getLoss(logits, y, restrictions)
trainer = tf.train.RMSPropOptimizer(learnRate).minimize(loss)

Question

I have a working solution for Tensorflow 1.0, it is a simple muodification of the loss function. However, I want to rewrite it in Tensorflow 2.0 and Keras.

I assume I would need to pass the class restriction matrix into model.fit() along with the inputs. How would I go about doing this?

Suboptimal solution idea

One easy solution (also proposed by Frederik) is to concatenate the input and class restriction matrix, and let the neural network learn the concept of class restriction from scratch.

However, this is not reliable, and makes the neural network unnecessarily bigger. Is there a better, simpler way with Keras?

Upvotes: 1

Answers (3)

rvinas

Reputation: 11895

Your loss function can be implemented exactly in the same way:

def getLoss(logits, y, restrictions):
    logits = tf.where(restrictions, -1000.0 * tf.ones_like(y, dtype=tf.float32), logits)
    return tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y)

The model can then be defined as follows:

x_input = Input(shape=(100,))
y_true = Input(shape=(20,))
restrictions = Input(shape=(20,), dtype=tf.bool)
# ... model definition here
y_pred = Dense(20)(x_input)
model = Model([x_input, restrictions, y_true], y_pred)

To compile the model, add the loss as follows:

model.add_loss(getLoss(y_pred, y_true, restrictions))
model.compile(optimizer='rmsprop')

Finally, the model can be trained using the fit method of the model. For example:

x = np.random.random((1000, 100))
restrictions = np.random.binomial(1, p=0.5, size=(1000, 20))
y = np.random.randint(20, size=1000)
y_onehot = np.eye(20)[y]
model.fit((x, restrictions, y_onehot), epochs=10, batch_size=10)

Upvotes: 1

Marco Cerliani

Reputation: 22031

Supposing you have to pass the class restriction matrix at inference time...

You can manually build the restriction operation on the logits inside a simple Lambda layer. Then you apply a softmax on your restricted logits and apply a standard cross entropy loss function.

Here a dummy example where we have our mask/restriction for the classes in binary format.

n_class = 8
n_sample = 10
X = np.random.uniform(0,1, (n_sample,30))
y = np.random.randint(0,n_class, (n_sample,))
mask = np.random.randint(0,2, (n_sample,n_class))

def mask_logits(logits, mask):
    restrictions = (mask > 0)
    return tf.keras.backend.switch(restrictions, -1000.0 * tf.ones_like(logits), logits)

inp_x = Input((X.shape[-1],))
inp_mask = Input((n_class,))
logits = Dense(n_class)(inp_x)
out = Lambda(mask_logits)(logits, inp_mask)
out = Activation('softmax')(out)
model = Model([inp_x, inp_mask], out)
model.compile('adam', 'sparse_categorical_crossentropy')

model.fit([X,mask], y, epochs=3)

At inference time the prediction can be retrieved in this way:

pred = model.predict([X, mask])

In the end, we compute some simple checks:

>>> pred.sum(1)
array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=float32)

Prediction probabilities sum to 1 row-wise

>>> pred == 0
array([[ True,  True, False,  True, False, False,  True, False],
       [ True,  True,  True,  True, False, False,  True,  True],
       [False, False, False,  True, False, False,  True,  True],
       [False, False,  True, False, False,  True,  True, False],
       [False, False,  True, False, False,  True,  True, False],
       [False, False,  True,  True, False, False, False, False],
       [False,  True, False,  True, False, False,  True,  True],
       [ True, False, False, False, False,  True, False,  True],
       [False,  True,  True, False, False, False,  True, False],
       [False,  True,  True, False,  True, False, False, False]])

Some predicted probabilities are equal to 0 as specified by our binary mask

Upvotes: 0

Frederik Bode

Reputation: 2744

How does this work at inference time though? Do you know the class restrictions of a new row as well at inference time?

If the answer is "yes":

I think you should not feed the entire class restriction matrix as input, but rather the class restriction vector using concatenation. So instead of feeding a row with shape (n,), you feed row_plus_class_restrictions with shape (n+20,).

row_feature_0
row_feature_1
...
row_feature_n
0
1
.
.
.
1

That way you don't need to nullify any error either, the model will learn what it should output based on the classification loss.

If the answer is "no":

Then your model doesn't make much sense. The training data is a bunch of (row, class_restrictions, class_it_should_be) with dimension (nb_row_features + 20 + 20), is that correct? What are you trying to train - really the practical application - what kind of data is in your rows? I don't understand what you would want if the answer is no.