Getting higher accuracy with softmax + categorical_crossentropy compared to sigmoid + binary_crossentropy in LSTM

Question

I am using Word2Vec encoding and training a LSTM model. My data only has two labels and about 10k instances with 45k features. My encoding's shape is (58137, 100), i trained it. I am keeping all the paramters same except for the softmax + categorical_crossentropy and sigmoid + binary_crossentropy. Since i have two labels i should be getting a better accuracy with sigmoid + binary_crossentropy? Here are my models.

#model.add(Embedding(maximum_words_number, e_dim, input_length=X.shape[1]))
model.add(Embedding(58137, 100, weights = [embeddings] ,input_length=X_train.shape[1],trainable = False)) # -> This adds Word2Vec encodings
model.add(LSTM(10,return_sequences= True, dropout=0.2, recurrent_dropout=0.2))
model.add(LSTM(10,return_sequences= False, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(2, activation='softmax'))
#opt = SGD(lr=0.05)
model.compile(loss='categorical_crossentropy', optimizer="Nadam", metrics=['accuracy'])
epochs = 4
batch_size = 100
model_outcome = model.fit(X_train, y_train_binary, epochs=epochs, batch_size=batch_size,validation_split=0.2,callbacks=[EarlyStopping(monitor='val_loss', patience=1, min_delta=0.0001)])

model = Sequential()
#model.add(Embedding(maximum_words_number, e_dim, input_length=X.shape[1]))
model.add(Embedding(58137, 100, weights = [embeddings] ,input_length=X_train.shape[1],trainable = False)) # -> This adds Word2Vec encodings
model.add(LSTM(10,return_sequences= True, dropout=0.2, recurrent_dropout=0.2))
model.add(LSTM(10,return_sequences= False, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(2, activation='sigmoid'))
#opt = SGD(lr=0.05)
model.compile(loss='binary_crossentropy', optimizer="Nadam", metrics=['accuracy'])
epochs = 4
batch_size = 100
model_outcome = model.fit(X_train, y_train_binary, epochs=epochs, batch_size=batch_size,validation_split=0.2,callbacks=[EarlyStopping(monitor='val_loss', patience=1, min_delta=0.0001)])

My accuracies and other evaluation scores (precision, recall and f1) on testing set is higher with the first model which uses softmax + categorical_crossentropy, can someone explain why is it the case to me? And if there is something wrong with the model i created please let me know.

Thank you.

Getting higher accuracy with softmax + categorical_crossentropy compared to sigmoid + binary_crossentropy in LSTM

Answers (1)

Related Questions