Reputation: 3148
I have a testDF like this and try to make a binary classification [0;1]:
Also I have a trainDF with the same structured and with filled bad values in it for training purposes.
I make a target and train sets from trainDF:
target = trainDF.bad.values
train = trainDF.drop('bad', axis=1).values
Then I append the logistic regression model and do the cross validation:
model=[]
model.append (linear_model.LogisticRegression(C=1e5))
TRNtrain, TRNtest, TARtrain, TARtest = train_test_split(train, target,test_size=0.3, random_state=0)
Then fit on validated and do the preds:
model.fit(TRNtrain, TARtrain)
pred_scr = model.predict_proba(TRNtest)[:, 1]
Then fit on the whole set and predict bad value:
model.fit(train, target)
test = testDF.drop('bad', axis=1).values
testDF.bad=model.predict(test)
I receive a df with filled values:
My question: How can I add the probability from logistic regression of bad value=1 in additional column? What steps should I take for that?
Any help would be greatly appreciated!
Upvotes: 0
Views: 4177
Reputation: 47
The above solution gives an Error AND it masks the bug that exists within predict_proba!
Give an incorrect result:
y_pred_prob_df = pd.DataFrame(model.predict_proba(test))
testDF['Prob_0'] = y_pred_prob_df[0]
testDF['Prob_1'] = y_pred_prob_df[1]
print test.shape
Validation:
predicted = test.loc[y_pred_test == 1]
predicted.reset_index(inplace=True)
prob_predicted = y_pred_prob_df.loc[y_pred_test == 1]
prob_predicted.reset_index(inplace=True)
Concat_all shows if the indices match or not. Simply doing the assignment will put non-matching data on the same row! Doing a concat shows the bug clearly, and can be handled.
concat_all = pd.concat([predicted, prob_predicted], axis=1)
print shape.concat_all
concat_all['a']=concat_all[0]+concat_all[1]
concat_all=concat_all[-concat_all['a'].isnull()]
print shape.concat_all
Upvotes: 1
Reputation: 36598
The .predict
method selects the most probable assignment for your input. If you want to access the probabilities you can use:
log_prob = model.predict_log_proba(test) # Log of probability estimates.
prob = model.predict_proba(test) # Probability estimates.
You can add either of these directly to the data frame via columnar assignment.
testDF['bad_prob'] = model.predict_proba(test)
Upvotes: 1