Tensorflow ValueError: logits and labels must have the same shape ((None, 2) vs (None, 1))

Question

I'm new to Machine Learning, thought I'll start with keras. Here I'm classifying movie reviews as three class classification (positive as 1, neutral as 0 and negative as -1) using binary crossentropy. So, when I'm trying to wrap my keras model with tensorflow estimator, I get the error.
The code is as follows:

import tensorflow as tf
import numpy as np
import pandas as pd
import numpy as K

csvfilename_train = 'train(cleaned).csv'
csvfilename_test = 'test(cleaned).csv'

# Read .csv files as pandas dataframes
df_train = pd.read_csv(csvfilename_train)
df_test = pd.read_csv(csvfilename_test)

train_sentences  = df_train['Comment'].values
test_sentences  = df_test['Comment'].values

# Extract labels from dataframes
train_labels = df_train['Sentiment'].values
test_labels = df_test['Sentiment'].values

vocab_size = 10000
embedding_dim = 16
max_length = 30
trunc_type = 'post'
oov_tok = ''

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

tokenizer = Tokenizer(num_words = vocab_size, oov_token = oov_tok)
tokenizer.fit_on_texts(train_sentences)
word_index = tokenizer.word_index
sequences = tokenizer.texts_to_sequences(train_sentences)
padded = pad_sequences(sequences, maxlen = max_length, truncating = trunc_type)

test_sequences = tokenizer.texts_to_sequences(test_sentences)
test_padded = pad_sequences(test_sequences, maxlen = max_length)

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim, input_length = max_length),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(6, activation = 'relu'),
    tf.keras.layers.Dense(2, activation = 'sigmoid'),
])
model.compile(loss = 'binary_crossentropy', optimizer = 'adam', metrics = ['accuracy'])

num_epochs = 10
model.fit(padded, train_labels, epochs = num_epochs, validation_data = (test_padded, test_labels))

And the error is as follows:

---> 10 model.fit(padded, train_labels, epochs = num_epochs, validation_data = (test_padded, test_labels))

And finally this:

ValueError: logits and labels must have the same shape ((None, 2) vs (None, 1))

Iswariya Manivannan · Accepted Answer

There are several issues with your code.

You are using the wrong loss function. The binary cross-entropy loss is used for binary classification problems but here you are doing a multi-class classification (3 classes - positive, negative, neutral).
Using the sigmoid activation function in the last layer is wrong because the sigmoid function maps logit values to a range between 0 and 1 (However, your class labels are 0, 1 and -1). This clearly shows that the network will never be able to predict a negative value because of the sigmoid function (which can only map values between 0 and 1) and hence, will never learn to predict the negative class.

The right approach would be to view this as a multi-class classification problem and use the categorical cross-entropy loss accompanied by the softmax activation in your last Dense layer with 3 units (one for each class). Note that one-hot encoded labels have to be used for the categorical cross-entropy loss and integer labels can be used along with the sparse categorical cross-entropy loss.

Below is an example using categorical cross-entropy loss.

tf.keras.layers.Dense(3, activation = 'softmax')

Note the 3 changes:

loss function changed to categorical cross-entropy
No. of units in final Dense layer is 3
One-hot encoding of labels is required and can be done using tf.one_hot

tf.one_hot(train_labels, 3)

.

Tensorflow ValueError: logits and labels must have the same shape ((None, 2) vs (None, 1))

Answers (1)

Related Questions