Reputation: 605
I am trying to design a classification model based on Deep Learning with TensorFlow and Keras. In my model, the label is a sequence with variable length, for example: ABC, CADB or ABCDB.
For simplicity, in the output layer I use a fixed length(equals to that of the longest sequence) to store all sequences. So if the length of a sequence is shorter than the fixed length, the sequence is represented by one-hot vectors(corresponds to the seuqence's actual length) and zero vectors(corresponds to the remaining length).
For example, if the fixed length is 5, a sequence CADB is represented by a 4 * 5 matrix like this:
Please note: the first 4 columns of this matrix are one-hot vectors, each of which has one and only one 1 entry, and all other entries are 0s. But the entries of the last column are all 0s, which can be seen as a zero padding because the sequence is not long enough.
If all vectors are one-hot vectors, categorical crossentropy is a good choice for loss function. But in my situation, some vectors(for example, the 5th column in the image above) have only 0 entries, and categorical crossentropy doesn't work here.
So my question is : what Loss Function to use in this situation?
Upvotes: 0
Views: 498
Reputation: 2642
This is what I understood: you have fixed sequence length as output. Say for example maximum length of sequence is 10 then you have last layer with output length of 10. There are cases where generated length sequence is only 4 in that case last 6 outputs will be 0.
This is a multi-label multi-class classification problem. Since you are using Keras in last Dense layer you can use sigmoid activation and for loss you can use binary_crossentropy.
Although, Not sure which architecture you are using but, sequence models like RNN, LSTM would can be better choice then simple Dense layer in case of sequence generation.
Upvotes: 1