RaduS
RaduS

Reputation: 2555

Keras multiple binary outputs

Can someone help me understand a bit better this problem? I must train a neural network which should output 200 mutually independent categories, each of these categories is a percentage ranging from 0 to 1. This seems to me like a binary_crossentropy problem, but every example I see on the internet uses binary_crossentropy with a single output. Since my output should be 200, if I apply binary_crossentropy, would that be correct?

This is what I have in mind, is that a correct approach or should I change it?

inputs = Input(shape=(input_shape,))
hidden = Dense(2048, activation='relu')(inputs)
hidden = Dense(2048, activation='relu')(hidden)
output = Dense(200, name='output_cat', activation='sigmoid')(hidden)
model = Model(inputs=inputs, outputs=[output])
loss_map = {'output_cat': 'binary_crossentropy'}
model.compile(loss=loss_map, optimizer="sgd", metrics=['mae', 'accuracy'])

Upvotes: 15

Views: 8373

Answers (5)

Sandeep Mandia
Sandeep Mandia

Reputation: 1

binary_crossentropy with Sigmoid activation function is used for binary (positive and negative) classification, whereas your case is multi-class classification. In the case of multi-class classification, categorical_crossentropy with softmax activation is used. The Sigmoid activation function generates the probability of input being positive class, and SoftMax generates probability corresponding to input being in each class. The class with maximum probability is assigned to the input.

Upvotes: 0

Troy D
Troy D

Reputation: 2245

I know this is an old question, but I believe the accepted answer is incorrect and the most upvoted answer is workable but not optimal. The original poster's method is the correct way to solve this problem. His output is 200 independent probabilities from 0 to 1, so his output layer should be a dense layer with 200 neurons and a sigmoid activation layer. It's not a categorical_crossentropy problem because it's not 200 mutually exclusive categories. Also, there's no reason to split the output using a lambda layer when a single dense layer will do. The original poster's method is correct. Here's another way to do it using the Keras interface.

model = Sequential()
model.add(Dense(2048, input_dim=n_input, activation='relu'))
model.add(Dense(2048, input_dim=n_input, activation='relu'))
model.add(Dense(200, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

Upvotes: 14

blackHoleDetector
blackHoleDetector

Reputation: 3033

For multiple category classification problems, you should use categorical_crossentropy rather than binary_crossentropy. With this, when your model classifies an input, it is going give a dispersion of probabilities between all 200 categories. The category that receives the highest probability will be the output for that particular input.

You can see this when you call model.predict(). If you were to call this function only on one input, for example, and print the results, you will see a result of 200 percentages (in total summing to 1). The hope is that one of those 200 percentages would be vastly higher than the others, which signals that the model thinks that there is a strong probability that this is the correct output (category) for this particular input.

This video may help clarify the prediction piece. Printing out the predictions starts around 3:17, but to get the full context, you'll need to start from the beginning.

Upvotes: -2

deepit
deepit

Reputation: 141

To optimize for multiple independent binary classification problems (and not multiple category problem where you can use categorical_crossentropy) using Keras, you could do the following (here I take the example of 2 independent binary outputs, but you can extend that as much as needed):

    inputs = Input(shape=(input_shape,))
    hidden = Dense(2048, activation='relu')(inputs)
    hidden = Dense(2048, activation='relu')(hidden)
    output = Dense(units = 2, activation='sigmoid')(hidden )

here you split your output using Keras's Lambda layer:

    output_1 = Lambda(lambda x: x[...,:1])(output)
    output_2 = Lambda(lambda x: x[...,1:])(output)

    adad = optimizers.Adadelta()

your model output becomes a list of the different independent outputs

    model = Model(inputs, [output_1, output_2])

you compile the model using one loss function for each output, in a list. (In fact, if you give only one kind of loss function, I believe it will apply it to all the outputs independently)

    model.compile(optimizer=adad, loss=['binary_crossentropy','binary_crossentropy'])

Upvotes: 14

pyan
pyan

Reputation: 3707

When there are multiple classes, categorical_crossentropy should be used. Refer to another answer here.

Upvotes: -2

Related Questions