Reputation: 1657

Which loss function should I use in my LSTM and why?

I try to understand Keras and LSTMs step by step. Right now I build an LSTM there the input is a sentence and the output is an array of five values which can each be 0 or 1.

Example: Input sentence: 'I hate cookies' Output example: [0,0,1,0,1]

For this, I am using keras library.

Now I am not sure which loss function I should use. Right now I just know two predefined loss functions a little bit better and both seem not to be good for my example:

Binary cross entropy: Good if I have a output of just 0 or 1 Categorical cross entropy: Good if I have an output of an array with one 1 and all other values being 0.

Both functions would not make any sense for my example. What would you use and why?

Edit

Another Question: Which Activation function would you use in Keras?

Upvotes: 1

Answers (5)

oladimeji

Reputation: 1

When it comes to regression problem in deep learning mean square error MSE is the most preferred loss function but when it comes to categorical problem where you want your output to be 1 or 0, true or false the cross binary entropy is preferable

Upvotes: 0

Sonius

Reputation: 1657

I've found a really good link myself explaining that the best method is to use "binary_crossentropy".

The reason is that every value in the array can be 0 or 1. So we have a binary problem.

I've tried it as well. With my dataset I was able to get an accuracy of 92% with binary cross entropy. With categorical cross entropy I just got 81% accuracy.

Edit

I forgot to add the link. Good explanations for multiple input/output models and which loss function to use:

https://towardsdatascience.com/deep-learning-which-loss-and-activation-functions-should-i-use-ac02f1c56aa8

Upvotes: 0

user11107939

Reputation: 27

A primer on cross entropy would be that cross-entropy loss, or log loss, measures the performance of a classification model whose output is a probability value between 0 and 1.

Cross-entropy loss increases as the predicted probability diverges from the actual label. So predicting a probability of .012 when the actual observation label is 1 would be bad and result in a high loss value.

A perfect model would have a log loss of 0. For the LSTM model you might or might not need this loss function. Here is a link to answer your question in more detail.

Upvotes: 1

liamconnell

Reputation: 3

You'll want to use a logistic activation. This pushes each logit between 0 and 1, which represents the probability of that category.

Then use categorical cross entropy. This will not make your model a single class classifier since you are using the logistic activation rather than the softmax activation.

As a rule of thumb:

logistic activation pushes values between 0 and 1
softmax pushes values between 0 and 1 AND makes them a valid probability distribution (sum to 1)
cross entropy calculates the difference between distributions of any type.

Upvotes: 0

Sssssuppp

Reputation: 711

This link should give you an idea as to what cross-entropy does and when would be a good time to use it. Activation functions are used on an experimental basis. There are quite a few activation functions in keras which you could try out for your scenario.

Please do refer to this Stanford video on youtube and this blog, these both will provide you with the basic understanding of how the loss function is chosen.

Good Luck!

Upvotes: 0

Which loss function should I use in my LSTM and why?

Edit

Answers (5)

Edit

Related Questions