What loss function to use if the output layer (label) is composed of one-hot vectors and zero vectors?

Question

I am trying to design a classification model based on Deep Learning with TensorFlow and Keras. In my model, the label is a sequence with variable length, for example: ABC, CADB or ABCDB.

For simplicity, in the output layer I use a fixed length(equals to that of the longest sequence) to store all sequences. So if the length of a sequence is shorter than the fixed length, the sequence is represented by one-hot vectors(corresponds to the seuqence's actual length) and zero vectors(corresponds to the remaining length).

For example, if the fixed length is 5, a sequence CADB is represented by a 4 * 5 matrix like this:

Please note: the first 4 columns of this matrix are one-hot vectors, each of which has one and only one 1 entry, and all other entries are 0s. But the entries of the last column are all 0s, which can be seen as a zero padding because the sequence is not long enough.

If all vectors are one-hot vectors, categorical crossentropy is a good choice for loss function. But in my situation, some vectors(for example, the 5th column in the image above) have only 0 entries, and categorical crossentropy doesn't work here.

So my question is : what Loss Function to use in this situation?

What loss function to use if the output layer (label) is composed of one-hot vectors and zero vectors?

Answers (1)

Related Questions