Reputation: 2069
Context: I have a set of documents, each of them with two associated probability values: probability to belong to class A or and probability to belong to class B. The classes are mutually exclusive, and the probabilities add up to one. So, for instance document D has probabilities (0.6, 0.4) associated as ground truth.
Each document is represented by the tfidf of the terms that it contains, normalized from 0 to 1. I also tried doc2vec (normalized form -1 to 1) and a couple of other methods.
I built a very simple Neural Network to predict this probability distribution.
This is the code I wrote using nolearn:
net = nolearn.lasagne.NeuralNet(
layers=[('input', layers.InputLayer),
('hidden1', layers.DenseLayer),
('output', layers.DenseLayer)],
input_shape=(None, X_train.shape[1]),
hidden1_num_units=1,
output_num_units=2,
output_nonlinearity=lasagne.nonlinearities.softmax,
objective_loss_function=lasagne.objectives.binary_crossentropy,
max_epochs=50,
on_epoch_finished=[es.EarlyStopping(patience=5, gamma=0.0001)],
regression=True,
update=lasagne.updates.adam,
update_learning_rate=0.001,
verbose=2)
net.fit(X_train, y_train)
y_true, y_pred = y_test, net.predict(X_test)
My problem is: my predictions have a cutoff point and no prediction goes below that point (check the picture to understand what I mean). This plot shows the difference between the true probability and my predictions. The closer a point is to the red line the better the prediction is. Ideally all the points would lie on the line. How can I solve this and why is this happening?
Edit: actually I solved the problem by simply removing the hidden layer:
net = nolearn.lasagne.NeuralNet(
layers=[('input', layers.InputLayer),
('output', layers.DenseLayer)],
input_shape=(None, X_train.shape[1]),
output_num_units=2,
output_nonlinearity=lasagne.nonlinearities.softmax,
objective_loss_function=lasagne.objectives.binary_crossentropy,
max_epochs=50,
on_epoch_finished=[es.EarlyStopping(patience=5, gamma=0.0001)],
regression=True,
update=lasagne.updates.adam,
update_learning_rate=0.001,
verbose=2)
net.fit(X_train, y_train)
y_true, y_pred = y_test, net.predict(X_test)
But I still fail to understand why I had this problem and why removing the hidden layer solved it. Any ideas?
Here the new plot:
Upvotes: 3
Views: 565
Reputation: 83
I think your training set output value should be [0,1] or [1,0],
[0.6,0.4] is not suited for softmax/Crossentropy .
Upvotes: 0