Keithx
Keithx

Reputation: 3148

Adding Probability to the predicted value

I have a testDF like this and try to make a binary classification [0;1]:

enter image description here

Also I have a trainDF with the same structured and with filled bad values in it for training purposes.

I make a target and train sets from trainDF:

target = trainDF.bad.values
train = trainDF.drop('bad', axis=1).values

Then I append the logistic regression model and do the cross validation:

model=[]
model.append (linear_model.LogisticRegression(C=1e5))
TRNtrain, TRNtest, TARtrain, TARtest = train_test_split(train, target,test_size=0.3, random_state=0)

Then fit on validated and do the preds:

model.fit(TRNtrain, TARtrain)
pred_scr = model.predict_proba(TRNtest)[:, 1]

Then fit on the whole set and predict bad value:

model.fit(train, target)
test = testDF.drop('bad', axis=1).values
testDF.bad=model.predict(test)

I receive a df with filled values:enter image description here

My question: How can I add the probability from logistic regression of bad value=1 in additional column? What steps should I take for that?

Any help would be greatly appreciated!

Upvotes: 0

Views: 4177

Answers (2)

Ally Ansari
Ally Ansari

Reputation: 47

The above solution gives an Error AND it masks the bug that exists within predict_proba!

Give an incorrect result:

y_pred_prob_df = pd.DataFrame(model.predict_proba(test))
testDF['Prob_0'] = y_pred_prob_df[0]
testDF['Prob_1'] = y_pred_prob_df[1]
print test.shape

Validation:

predicted = test.loc[y_pred_test == 1]
predicted.reset_index(inplace=True)
prob_predicted = y_pred_prob_df.loc[y_pred_test == 1]
prob_predicted.reset_index(inplace=True)

Concat_all shows if the indices match or not. Simply doing the assignment will put non-matching data on the same row! Doing a concat shows the bug clearly, and can be handled.

concat_all = pd.concat([predicted, prob_predicted], axis=1)  
print shape.concat_all

concat_all['a']=concat_all[0]+concat_all[1]
concat_all=concat_all[-concat_all['a'].isnull()]

print shape.concat_all

Upvotes: 1

James
James

Reputation: 36598

The .predict method selects the most probable assignment for your input. If you want to access the probabilities you can use:

log_prob = model.predict_log_proba(test)  # Log of probability estimates.
prob = model.predict_proba(test)   # Probability estimates.

You can add either of these directly to the data frame via columnar assignment.

testDF['bad_prob'] = model.predict_proba(test)

Upvotes: 1

Related Questions