Susie
Susie

Reputation: 21

Why is the probability predicted by my keras model always zero

I've built a model to predict loan suitability on a Kaggle dataset here

dataset = df.values
X = dataset[:,0:11].astype(float)
Y = dataset[:,11]
scaler = StandardScaler()
X = scaler.fit_transform(X)
model = Sequential()
model.add(Dense(5, input_dim=11, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, Y, epochs=150, batch_size=10, verbose=0)
scores = model.evaluate( X, Y, verbose=0)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
model.save("model.h5")

This model provides accuracy of 81.43%. The problem arises when I try to make a prediction based on this model. Here I've passed the third row of data in the dataset to the model as an array and the probability, as it is for other rows, is zero.

model = load_model('model.h5')
X = np.array([[0, 1, 0, 0, 1, 3000, 0, 66, 360, 1, 0]], dtype=np.float32)
scaler = StandardScaler()
X = scaler.fit_transform(X)
X = scaler.transform(X.reshape(1, -1))
pred = model.predict(X)
print(X)
print("Probability that eligibility = 1:")
print(pred)

I get the output:

[[ 0.000e+00 -1.000e+00 -1.000e+00  0.000e+00  0.000e+00 -4.583e+03
  -1.508e+03 -1.280e+02 -3.600e+02 -1.000e+00 -1.000e+00]]
Probability that eligibility = 1:
[[0.]]

I have not been able to find a solution here on stackoverflow or other sites.

Upvotes: 0

Views: 380

Answers (2)

Yefet
Yefet

Reputation: 2086

Do not fit a new Scalar object for new data, You need to save the StandardScaler you used for train data in addition to your model , load it and transform your new data ,

save it

from pickle import dump
scaler = StandardScaler()
X = scaler.fit_transform(X)
dump(scaler, open('scaler.pkl', 'wb'))

then load it when you wanna predict

from pickle import load
scaler = load(open('scaler.pkl', 'rb'))
X = np.array([[0, 1, 0, 0, 1, 3000, 0, 66, 360, 1, 0]], dtype=np.float32)
scaler.transform(X)

Upvotes: 1

B Douchet
B Douchet

Reputation: 1020

You're performing standardization for the training part, which is great. However, you're predicted with values that are mis-standardized. When you perform standardization for the training part, you calculate the mean and std of each column and make the operation.

However, the predicting part is not good because you calculate the mean and std of the row.

The correct training process is :

  1. Calculate mean and std of all the columns from your training dataset
  2. Operate the standardization with the column values :

X_standard = (X - mean_column) / std_column

  1. Train your model

The correct predicting process is :

  1. Select a row and standardize each element by the corresponding mean and std calculated at 1.
  2. Predict

Upvotes: 0

Related Questions