My RandomForest keeps returning the exact same probabilities for model.predict_proba() regardless of input

Question

The code is supposed to predict likelihood of diabetes given parameters like glucose, blood pressure, BMI and age:

I first had to trim out the columns I didn't need:

 df=pd.read_csv('diabetes.csv')
    keep_col = ['Glucose', 'BloodPressure','BMI', 'Age', 'Outcome']
    df = df[keep_col]
    df.to_csv('newFile.csv', index=False)

Then I had to even out the data set because there were twice as many patients that did not have diabetes:

shuffled_df = df.sample(frac=1,random_state=4)
fraud_df = df.loc[shuffled_df['Outcome'] == 1]
non_fraud_df = shuffled_df.loc[shuffled_df['Outcome'] == 0].sample(n=684,random_state=42)
df = pd.concat([fraud_df, non_fraud_df])

Making the training and testing sets:

X = df.iloc[:,:-1].values
Y = df.iloc[:,-1].values

X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.25)

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Training the model:

model = RandomForestClassifier(n_estimators=1, criterion='entropy', random_state=1)
model.fit(X_train, Y_train)

Testing accuracy, this usually returns anywhere from around .95-1:

model.score(X_train, Y_train)

Printing number of true negatives, true positives, false negatives, and false positives:

cm = confusion_matrix(Y_test, model.predict(X_test))

TN = cm[0][0]
TP = cm[1][1]
FN = cm[1][0]
FP = cm[0][1]

print(cm)

print('Model Test Accuracy = {}'. format( (TP + TN )/ (TP + TN + FN + FP) ) )

Model Test Accuracy is usually above 80%

Finally, when I go use the model to make a new prediction such as:

model.predict_proba([[140,77,25,30]])

It always returns the same value such as "array([[.3, .6]])" even when I switch glucose from 140 to 190 or if I switch BMI from 25 to 30 and etc. The only time that the probabilities change is when I change the number of estimators but even then, they don't change with different inputs either.

Any help with this problem would be much appreciated!

My RandomForest keeps returning the exact same probabilities for model.predict_proba() regardless of input

Answers (1)

Related Questions