Abhishek Patel
Abhishek Patel

Reputation: 814

Multilabel classification converges to all zeroes

I am attempting to do a one vs all multilabel classification. I feed a batch of input to each classifier along with expected labels. The classifiers use a softmax layer for output to predict a label as yes or no. Also I am using a softmax cross entropy loss for each classifier and each classifier tries to minimize its own loss. The classifiers keep minimizing their loss at each step but predict every label as zero.

I suspect this is because the positive examples for a label are pretty small compared to the size of the entire dataset.

Is this because I'm doing something wrong in the way I train my models or is it because of the asymmetric distribution of data for each individual label?

I'm hoping to limit the number of negative samples but just wanted to make sure that that is the correct direction to go.

Here's the code I am using for each classifier. I have a classifier for every label.

   self.w1 = tf.Variable(tf.truncated_normal([embedding_size, hidden_size],-0.1,0.1), dtype=tf.float32, name="weight1")
    self.b1 = tf.Variable(tf.zeros([hidden_size]), dtype=tf.float32, name="bias1")
    self.o1 = tf.sigmoid(tf.matmul(embed,self.w1) + self.b1)

    self.w2 = tf.Variable(tf.truncated_normal([hidden_size,2],-0.1,0.1), dtype=tf.float32, name="weight2")
    self.b2 = tf.Variable(tf.zeros([1]), dtype=tf.float32, name="bias2")
    self.logits = tf.matmul(self.o1, self.w2) + self.b2
    self.prediction = tf.nn.softmax(self.logits, name="prediction")

    self.loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=self.logits, labels=labels)) 
    self.optimizer = tf.train.AdamOptimizer(1e-3).minimize(self.loss)

EDIT: After using a simple multi-label classifier with sigmoid_cross_entropy_with_logits it still converges to zero. I'm posting the code for this version in case it helps:

    self.inp_x = tf.placeholder(shape=[None], dtype=tf.int32, name="inp_x")
    self.labels = tf.placeholder(shape=[None,num_labels], dtype=tf.float32, name="labels")
    self.embeddings = tf.placeholder(shape=[vocabulary_size,embedding_size], dtype=tf.float32,name="embeddings")
    self.embed = tf.nn.embedding_lookup(self.embeddings, self.inp_x)

    self.w1 = tf.Variable(tf.truncated_normal([embedding_size, hidden_size],-0.1,0.1), dtype=tf.float32, name="weight1")
    self.b1 = tf.Variable(tf.zeros([hidden_size]), dtype=tf.float32, name="bias1")
    self.o1 = tf.sigmoid(tf.matmul(self.embed,self.w1) + self.b1)
    self.w2 = tf.Variable(tf.truncated_normal([hidden_size,num_labels],-0.1,0.1), dtype=tf.float32, name="weight2")
    self.b2 = tf.Variable(tf.zeros([num_labels]), dtype=tf.float32, name="bias2")
    self.logits = tf.matmul(self.o1, self.w2) + self.b2
    self.prediction = tf.sigmoid(self.logits, name='prediction')

    self.loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits = self.logits, labels = self.labels))
    self.optimizer = tf.train.AdamOptimizer(1e-3).minimize(self.loss)

Upvotes: 5

Views: 2473

Answers (1)

Desh Raj
Desh Raj

Reputation: 113

Since you have not mentioned the actual data distribution, it is very difficult to guess whether the issue is with your code or with the dataset. However, you can try feeding a set which is uniformly distributed across the classes and check the result. If the problem is indeed a skewed distribution, you can try the following:

  1. Oversampling the positive (or minority classes) by copying their instances.
  2. Undersampling the majority class.
  3. Using a weighted loss function. Tensorflow has an inbuilt function called weighted_cross_entropy_with_logits which provides this functionality, albeit only for binary classification, where you can specify the pos_weight you want to assign the minority class.
  4. You could also filter negative instances manually, but this method requires some domain knowledge.

Upvotes: 3

Related Questions