Reputation: 241
I doing Text Classification by Convolution Neural Network. I used health documents (ICD-9-CM code) for my project and I used the same model as dennybritz used but my data has 36 labels. I used one_hot encoding to encode my label.
Here is my problem, when I run data which has one label for each document my code the accuracy is perfect from 0.8 to 1. If I run data which has more than one labels, the accuracy is significantly reduced.
For example: a document has single label as "782.0"
: [0 0 1 0 ... 0]
,
a document has multiple label as "782.0 V13.09 593.5"
: [1 0 1 0 ... 1]
.
Could anyone suggest why this happen and how to improve it?
Upvotes: 1
Views: 1420
Reputation: 16104
The label encoding seems correct. If you have multiple correct labels, [1 0 1 0 ... 1]
looks totally fine. The loss function used in Denny's post is tf.nn.softmax_cross_entropy_with_logits
, which is the loss function for a multi-class problem.
Computes softmax cross entropy between logits and labels.
Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class).
In multi-label problem, you should use tf.nn.sigmoid_cross_entropy_with_logits
:
Computes sigmoid cross entropy given logits.
Measures the probability error in discrete classification tasks in which each class is independent and not mutually exclusive. For instance, one could perform multilabel classification where a picture can contain both an elephant and a dog at the same time.
The input to the loss function would be logits (WX
) and targets (labels).
In order to measure the accuracy correctly for a multi-label problem, the code below needs to be changed.
# Calculate Accuracy
with tf.name_scope("accuracy"):
correct_predictions = tf.equal(self.predictions, tf.argmax(self.input_y, 1))
self.accuracy = tf.reduce_mean(tf.cast(correct_predictions, "float"), name="accuracy")
The logic of correct_predictions
above is incorrect when you could have multiple correct labels. For example, say num_classes=4
, and label 0 and 2 are correct. Thus your input_y=[1, 0, 1, 0].
The correct_predictions
would need to break tie between index 0 and index 2. I am not sure how tf.argmax
breaks tie but if it breaks the tie by choosing the smaller index, a prediction of label 2 is always considered wrong, which definitely hurt your accuracy measure.
Actually in a multi-label problem, precision and recall are better metrics than accuracy. Also you can consider using precision@k (tf.nn.in_top_k
) to report classifier performance.
Upvotes: 4