The predict_proba() function of MLPClassifier from sklearn outputs the total possibility is not equal to one

Question

I used MLPClassifier from sklearn to build a neural network to predict the result of horse racing. However, sometimes, when I used the predict_proba() function to predict the winning possibility of each horse, I found that the total possibility was not equal to 1. Sometimes, it might be 0.8xx or 1.2xxx or 1.1xxx, etc. The worst cases could be 2.5xx or 0.3xxx, etc.

No matter how I tuned the model, it would still happen in some predictions. Also, I did the MinMaxScaler before input the data to the model.

rdx = rdx.fillna(value=-999) #-999 means missing data
x = np.array(rdx) #rdx is the feature of data 
y = np.array(rdy) #rdy is the label of data

# Scale Feature
scaler = MinMaxScaler()
scaler.fit(x)
x = scaler.transform(x)

# Build network
mlp =  MLPClassifier( activation='relu',alpha=1e-4,hidden_layer_sizes=(20,20), random_state=1,max_iter=1000,verbose=10,learning_rate_init=.1)
mlp.fit(x, y)

Features (rdx):

Labels (rdy):

Result: In the each red color box, it is the possibility of all horses in each race. But it is not equal to one after summation. Sometimes, it would be worse such as 3.5 or 0.5, etc.

2 Records of Data:

What can I do to prevent it?

Jayant Sahewal · Accepted Answer

I think the issue is the way you have structured your labels. You are checking for each horse, what's the probability that the horse will win. These will not need to sum to 1.

Take a look at the below example: https://scikit-learn.org/stable/auto_examples/neural_networks/plot_mnist_filters.html#sphx-glr-auto-examples-neural-networks-plot-mnist-filters-py

Shape of y_train and y_test is (60000,) and (40000,) is respectively.

But, if you change y_train and y_test to one hot encoded vector (like you have in your data) and then then train a new MLP model on the transformed labels, then you will see that you will get probabilities which will not sum to 1.

I have modified the example referenced in the above link to show what I am trying to say:

import matplotlib.pyplot as plt
from sklearn.datasets import fetch_openml
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import OneHotEncoder

# Load data from https://www.openml.org/d/554
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)
X = X / 255.

# rescale the data, use the traditional train/test split
X_train, X_test = X[:60000], X[60000:]
y_train, y_test = y[:60000], y[60000:]

# mlp = MLPClassifier(hidden_layer_sizes=(100, 100), max_iter=400, alpha=1e-4,
#                     solver='sgd', verbose=10, tol=1e-4, random_state=1)
mlp = MLPClassifier(hidden_layer_sizes=(50,), max_iter=10, alpha=1e-4,
                    solver='sgd', verbose=None, tol=1e-4, random_state=1,
                    learning_rate_init=.1)

mlp.fit(X_train, y_train)

print(y_train.shape)
print(mlp.predict_proba(X_test[:10]).sum(axis=1))

enc = OneHotEncoder(handle_unknown='ignore')

enc.fit(y_train.reshape(-1, 1))

y_train_transformed = enc.transform(y_train.reshape(-1, 1)).toarray()
y_test_transformed = enc.transform(y_test.reshape(-1, 1)).toarray()

# mlp = MLPClassifier(hidden_layer_sizes=(100, 100), max_iter=400, alpha=1e-4,
#                     solver='sgd', verbose=10, tol=1e-4, random_state=1)
mlp_new = MLPClassifier(hidden_layer_sizes=(50,), max_iter=10, alpha=1e-4,
                    solver='sgd', verbose=None, tol=1e-4, random_state=1,
                    learning_rate_init=.1)

mlp_new.fit(X_train, y_train_transformed)

print(y_train_transformed.shape)
print(mlp_new.predict_proba(X_test[:10]).sum(axis=1))

The predict_proba() function of MLPClassifier from sklearn outputs the total possibility is not equal to one

Answers (1)

Related Questions