Fatih Enes
Fatih Enes

Reputation: 95

Getting higher accuracy with softmax + categorical_crossentropy compared to sigmoid + binary_crossentropy in LSTM

I am using Word2Vec encoding and training a LSTM model. My data only has two labels and about 10k instances with 45k features. My encoding's shape is (58137, 100), i trained it. I am keeping all the paramters same except for the softmax + categorical_crossentropy and sigmoid + binary_crossentropy. Since i have two labels i should be getting a better accuracy with sigmoid + binary_crossentropy? Here are my models.

#model.add(Embedding(maximum_words_number, e_dim, input_length=X.shape[1]))
model.add(Embedding(58137, 100, weights = [embeddings] ,input_length=X_train.shape[1],trainable = False)) # -> This adds Word2Vec encodings
model.add(LSTM(10,return_sequences= True, dropout=0.2, recurrent_dropout=0.2))
model.add(LSTM(10,return_sequences= False, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(2, activation='softmax'))
#opt = SGD(lr=0.05)
model.compile(loss='categorical_crossentropy', optimizer="Nadam", metrics=['accuracy'])
epochs = 4
batch_size = 100
model_outcome = model.fit(X_train, y_train_binary, epochs=epochs, batch_size=batch_size,validation_split=0.2,callbacks=[EarlyStopping(monitor='val_loss', patience=1, min_delta=0.0001)])


model = Sequential()
#model.add(Embedding(maximum_words_number, e_dim, input_length=X.shape[1]))
model.add(Embedding(58137, 100, weights = [embeddings] ,input_length=X_train.shape[1],trainable = False)) # -> This adds Word2Vec encodings
model.add(LSTM(10,return_sequences= True, dropout=0.2, recurrent_dropout=0.2))
model.add(LSTM(10,return_sequences= False, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(2, activation='sigmoid'))
#opt = SGD(lr=0.05)
model.compile(loss='binary_crossentropy', optimizer="Nadam", metrics=['accuracy'])
epochs = 4
batch_size = 100
model_outcome = model.fit(X_train, y_train_binary, epochs=epochs, batch_size=batch_size,validation_split=0.2,callbacks=[EarlyStopping(monitor='val_loss', patience=1, min_delta=0.0001)])

My accuracies and other evaluation scores (precision, recall and f1) on testing set is higher with the first model which uses softmax + categorical_crossentropy, can someone explain why is it the case to me? And if there is something wrong with the model i created please let me know.

Thank you.

Upvotes: 1

Views: 135

Answers (1)

Timbus Calin
Timbus Calin

Reputation: 15003

The accuracies should be the same(or very similar considering that you do not set seeds for exact reproducibility), but in your comparisons you made a mistake at this line:

model.add(Dense(2, activation='sigmoid'))

Here, for the binary_crossentropy and the sigmoid, you need 1 instead of 2 neurons.

Therefore,

model.add(Dense(1, activation='sigmoid'))

Of course, you need to make sure you provide the data in the right format (sigmoid and BCE [0,1,1,1,...] instead of softmax + CCE [[0,1],[1,0],[1,0],[1,0],...].

Upvotes: 1

Related Questions