Inkidu616
Inkidu616

Reputation: 399

Visualize logistic regression fit with stats models

Straightforward question, really. I just fit a logistic regression to some data:

logit = sm.Logit(df.flow2, df.latency_condition)
result = logit.fit()

print(result.summary())

Which yields:

                          Logit Regression Results                           
==============================================================================
Dep. Variable:                  flow2   No. Observations:                 5930
Model:                          Logit   Df Residuals:                     5929
Method:                           MLE   Df Model:                            0
Date:                Mon, 10 Sep 2018   Pseudo R-squ.:                 -0.3009
Time:                        21:18:35   Log-Likelihood:                -3927.8
converged:                       True   LL-Null:                       -3019.2
                                        LLR p-value:                       nan
=====================================================================================

I now would like to plot this result on top of my data points, but I have no idea how to do this. I used seaborn to plot a regression:

sns.lmplot(x="latency_condition", logistic=True, y="flow2", data=df)
plt.show()

I know lmplot uses statsmodels, but I'm not sure how I fit the model was exactly the same as how lmplot does it. Also, I just want to be able to plot the complete logistic regression curve (from y=1 to y=0). So how do I plot this statsmodels result? Alternative approaches are welcome.

Edit:

Daniel below gave me a straightforward solution, and I believe it's correct. I'm not sure what the difference is between fitting logistic regression my way, and what lmplot does. I'm guessing I should mirror my x-axis, or fit a different curve, due to the downward slope of my data?

This is what lmplot gives me:

lmplot

And this is the result of the regression:

enter image description here

Upvotes: 2

Views: 6788

Answers (1)

Daneel R.
Daneel R.

Reputation: 547

Ok so I tested a solution, and it works. Try this:

import pandas as pd
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt

HOW_MANY = 10
x = np.random.randn(HOW_MANY)

y = np.linspace(0,1,HOW_MANY)
df = pd.DataFrame({'x':x,'y':y})
logit = sm.Logit(df['y'],df['x']).fit()

pred_input = np.linspace(x.min(),x.max(),HOW_MANY)
predictions = logit.predict(pred_input)
plt.scatter(df['x'],df['y'])
plt.plot(pred_input,predictions,c='red')
plt.show()

If you want to extend the red curve further towards right or left, just pass a pred_input array that spans a larger range.

I know lmplot uses statsmodels, but I'm not sure how I fit the model was exactly the same as how lmplot does it.

You don't have any guarantee, since sns.lmplot() will fit a new regression if you call it like you suggest. You want to plot the prediction space of the Logit constructor, by feeding it a mock input vector that ranges across the space of all possible inputs, or as much of it as feasible. 10/100 values is a good number.

Upvotes: 5

Related Questions