wrong plot in logistic regression

Question

I am trying to implement logistic regression but I am receiving wrong plot.

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import cross_validation
from sklearn.linear_model import LogisticRegression
sns.set()

x = (np.random.randint(2000, size=400)).reshape((400,1))
y = (np.random.randint(2, size=400)).reshape((400,1)).ravel()

x_train, x_test, y_train, y_test = cross_validation.train_test_split(x, y, test_size=0.4, random_state=0)

logistic_regr = LogisticRegression()
logistic_regr.fit(x_train, y_train)

fig, ax = plt.subplots()

ax.set(xlabel='x', ylabel='y')
ax.plot(x_test, logistic_regr.predict_proba(x_test), label='Logistic regr')
#ax.plot(x_test,logistic_regr.predict(x_test), label='Logistic regr')
ax.legend()

And I am receiving the following plot:

If I use:

ax.plot(x_test,logistic_regr.predict(x_test), label='Logistic regr')

I am receiving:

nullop · Accepted Answer

Well, you will not get a graph of sigmoid function with your particular choice of data. Your random input makes algorithm to find some separation between classes that will predict probabilities close to 0.5 with variations depending on the randomness of your input. You could get a sigmoid by using an evenly split range of values, one half of which belongs to the first class and the second half belongs to the second class. This way your predict_proba() function will output a range of probabilities for the particular class varied from 0 to 1 (I assume that the rest of your code will remain intact):

x = np.linspace(-2, 2, 400).reshape((400,1))
y = np.vstack((np.zeros(200), np.ones(200))).reshape((400,1))

then generate your graph:

ax.plot(x_test, logistic_regr.predict_proba(x_test)[:,1], '.', label='Logistic regr')

You will get a sigmoid-shaped plot describing the probability of predicting one of the classes:

wrong plot in logistic regression

Answers (1)

Related Questions