Raj Shah
Raj Shah

Reputation: 13

How can i find the probability of a model classifying an input as [0,1]

I am working on a classification problem in which i want to find the "probability of an input being classified as [1,0]" and "not [1,0]"

I tried using predict_proba method of SVC which gives the probability of class which I'm not looking for

from sklearn.svm import SVC

model = SVC(probability=True)
model.fit(final_data,foreclosure_y)
results = model.predict_proba(final_data_test)[0]

I expect my output to be like this

index,y
---------    
0,0.45
1,0.62
2,0.43
3,0.12
4,0.55

Note: above output is in form .csv where y is the test_y

Here the column y is probabilities of each instance indexed from 0 to 4 that is could be classified as 0 or 1

For eg:- index 0 has probability 0.45 to be classified as 0 or 1

Upvotes: 0

Views: 1066

Answers (1)

desertnaut
desertnaut

Reputation: 60321

Notice that

sum([0.58502114, 0.41497886])
# 1.0

predict_proba gives the probabilities for both your classes (hence the array elements sum up to 1), in the order that they appear in model.classes_; quoting from the docs (which are always your best friend in such situations):

Returns the probability of the sample for each class in the model. The columns correspond to the classes in sorted order, as they appear in the attribute classes_.

Here is an example with toy data to illustrate the idea:

from sklearn.svm import SVC
model = SVC(probability=True)
X = [[1,2,3], [2,3,4]] # feature vectors
Y = [0, 1] # classes
model.fit(X, Y)

Let's now get the predicted probabilities for the first instance in the training set [1,2,3]:

model.predict_proba(X)[0]
# array([0.39097541, 0.60902459])

OK, what is the order - i.e., which probability belongs to which class?

model.classes_
# array([0, 1])

So, this means that the probability for the instance belonging to class 0 is the first element of the array 0.39097541, while the probability for belonging to class 1 is the second element 0.60902459; and again, they sum up to 1, as expected:

sum([0.39097541, 0.60902459])
# 1.0

UPDATE

Now, in outputs such as the one you require, we don't put both probabilities; by convention, and for binary classification, we only include the probability for each instance belonging to class 1; here is how we can do it for the toy dataset X shown above of only 2 instances:

pred = model.predict_proba(X)
pred
# array([[ 0.39097541,  0.60902459],
#        [ 0.60705475,  0.39294525]])

import pandas as pd
out = pd.DataFrame(pred[:,1],columns=['y']) # keep only the second element of the arrays in pred, i.e. the probability for class 1
print(out)

Result:

          y
0  0.609025
1  0.392945

Upvotes: 2

Related Questions