Multilabel classification converges to all zeroes

Question

I am attempting to do a one vs all multilabel classification. I feed a batch of input to each classifier along with expected labels. The classifiers use a softmax layer for output to predict a label as yes or no. Also I am using a softmax cross entropy loss for each classifier and each classifier tries to minimize its own loss. The classifiers keep minimizing their loss at each step but predict every label as zero.

I suspect this is because the positive examples for a label are pretty small compared to the size of the entire dataset.

Is this because I'm doing something wrong in the way I train my models or is it because of the asymmetric distribution of data for each individual label?

I'm hoping to limit the number of negative samples but just wanted to make sure that that is the correct direction to go.

Here's the code I am using for each classifier. I have a classifier for every label.

   self.w1 = tf.Variable(tf.truncated_normal([embedding_size, hidden_size],-0.1,0.1), dtype=tf.float32, name="weight1")
    self.b1 = tf.Variable(tf.zeros([hidden_size]), dtype=tf.float32, name="bias1")
    self.o1 = tf.sigmoid(tf.matmul(embed,self.w1) + self.b1)

    self.w2 = tf.Variable(tf.truncated_normal([hidden_size,2],-0.1,0.1), dtype=tf.float32, name="weight2")
    self.b2 = tf.Variable(tf.zeros([1]), dtype=tf.float32, name="bias2")
    self.logits = tf.matmul(self.o1, self.w2) + self.b2
    self.prediction = tf.nn.softmax(self.logits, name="prediction")

    self.loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=self.logits, labels=labels)) 
    self.optimizer = tf.train.AdamOptimizer(1e-3).minimize(self.loss)

EDIT: After using a simple multi-label classifier with sigmoid_cross_entropy_with_logits it still converges to zero. I'm posting the code for this version in case it helps:

    self.inp_x = tf.placeholder(shape=[None], dtype=tf.int32, name="inp_x")
    self.labels = tf.placeholder(shape=[None,num_labels], dtype=tf.float32, name="labels")
    self.embeddings = tf.placeholder(shape=[vocabulary_size,embedding_size], dtype=tf.float32,name="embeddings")
    self.embed = tf.nn.embedding_lookup(self.embeddings, self.inp_x)

    self.w1 = tf.Variable(tf.truncated_normal([embedding_size, hidden_size],-0.1,0.1), dtype=tf.float32, name="weight1")
    self.b1 = tf.Variable(tf.zeros([hidden_size]), dtype=tf.float32, name="bias1")
    self.o1 = tf.sigmoid(tf.matmul(self.embed,self.w1) + self.b1)
    self.w2 = tf.Variable(tf.truncated_normal([hidden_size,num_labels],-0.1,0.1), dtype=tf.float32, name="weight2")
    self.b2 = tf.Variable(tf.zeros([num_labels]), dtype=tf.float32, name="bias2")
    self.logits = tf.matmul(self.o1, self.w2) + self.b2
    self.prediction = tf.sigmoid(self.logits, name='prediction')

    self.loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits = self.logits, labels = self.labels))
    self.optimizer = tf.train.AdamOptimizer(1e-3).minimize(self.loss)

Multilabel classification converges to all zeroes

Answers (1)

Related Questions