Álvaro
Álvaro

Reputation: 2069

Cutoff on Neural Network regression predictions

Context: I have a set of documents, each of them with two associated probability values: probability to belong to class A or and probability to belong to class B. The classes are mutually exclusive, and the probabilities add up to one. So, for instance document D has probabilities (0.6, 0.4) associated as ground truth.

Each document is represented by the tfidf of the terms that it contains, normalized from 0 to 1. I also tried doc2vec (normalized form -1 to 1) and a couple of other methods.

I built a very simple Neural Network to predict this probability distribution.

This is the code I wrote using nolearn:

net = nolearn.lasagne.NeuralNet(
    layers=[('input', layers.InputLayer),
        ('hidden1', layers.DenseLayer),
        ('output', layers.DenseLayer)],
    input_shape=(None, X_train.shape[1]),
    hidden1_num_units=1,
    output_num_units=2,
    output_nonlinearity=lasagne.nonlinearities.softmax,
    objective_loss_function=lasagne.objectives.binary_crossentropy,
    max_epochs=50,
    on_epoch_finished=[es.EarlyStopping(patience=5, gamma=0.0001)],
    regression=True,
    update=lasagne.updates.adam,
    update_learning_rate=0.001,
    verbose=2)
net.fit(X_train, y_train)
y_true, y_pred = y_test, net.predict(X_test)

My problem is: my predictions have a cutoff point and no prediction goes below that point (check the picture to understand what I mean). This plot shows the difference between the true probability and my predictions. The closer a point is to the red line the better the prediction is. Ideally all the points would lie on the line. How can I solve this and why is this happening?

Edit: actually I solved the problem by simply removing the hidden layer:

net = nolearn.lasagne.NeuralNet(
    layers=[('input', layers.InputLayer),
        ('output', layers.DenseLayer)],
    input_shape=(None, X_train.shape[1]),
    output_num_units=2,
    output_nonlinearity=lasagne.nonlinearities.softmax,
    objective_loss_function=lasagne.objectives.binary_crossentropy,
    max_epochs=50,
    on_epoch_finished=[es.EarlyStopping(patience=5, gamma=0.0001)],
    regression=True,
    update=lasagne.updates.adam,
    update_learning_rate=0.001,
    verbose=2)
net.fit(X_train, y_train)
y_true, y_pred = y_test, net.predict(X_test)

But I still fail to understand why I had this problem and why removing the hidden layer solved it. Any ideas?

Here the new plot: 2

Upvotes: 3

Views: 565

Answers (1)

fang jack
fang jack

Reputation: 83

I think your training set output value should be [0,1] or [1,0],
[0.6,0.4] is not suited for softmax/Crossentropy .

Upvotes: 0

Related Questions