Reputation: 399
Straightforward question, really. I just fit a logistic regression to some data:
logit = sm.Logit(df.flow2, df.latency_condition)
result = logit.fit()
print(result.summary())
Which yields:
Logit Regression Results
==============================================================================
Dep. Variable: flow2 No. Observations: 5930
Model: Logit Df Residuals: 5929
Method: MLE Df Model: 0
Date: Mon, 10 Sep 2018 Pseudo R-squ.: -0.3009
Time: 21:18:35 Log-Likelihood: -3927.8
converged: True LL-Null: -3019.2
LLR p-value: nan
=====================================================================================
I now would like to plot this result on top of my data points, but I have no idea how to do this. I used seaborn to plot a regression:
sns.lmplot(x="latency_condition", logistic=True, y="flow2", data=df)
plt.show()
I know lmplot uses statsmodels, but I'm not sure how I fit the model was exactly the same as how lmplot does it. Also, I just want to be able to plot the complete logistic regression curve (from y=1 to y=0). So how do I plot this statsmodels result? Alternative approaches are welcome.
Edit:
Daniel below gave me a straightforward solution, and I believe it's correct. I'm not sure what the difference is between fitting logistic regression my way, and what lmplot does. I'm guessing I should mirror my x-axis, or fit a different curve, due to the downward slope of my data?
This is what lmplot gives me:
And this is the result of the regression:
Upvotes: 2
Views: 6788
Reputation: 547
Ok so I tested a solution, and it works. Try this:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
HOW_MANY = 10
x = np.random.randn(HOW_MANY)
y = np.linspace(0,1,HOW_MANY)
df = pd.DataFrame({'x':x,'y':y})
logit = sm.Logit(df['y'],df['x']).fit()
pred_input = np.linspace(x.min(),x.max(),HOW_MANY)
predictions = logit.predict(pred_input)
plt.scatter(df['x'],df['y'])
plt.plot(pred_input,predictions,c='red')
plt.show()
If you want to extend the red curve further towards right or left, just pass a pred_input array that spans a larger range.
I know lmplot uses statsmodels, but I'm not sure how I fit the model was exactly the same as how lmplot does it.
You don't have any guarantee, since sns.lmplot() will fit a new regression if you call it like you suggest. You want to plot the prediction space of the Logit constructor, by feeding it a mock input vector that ranges across the space of all possible inputs, or as much of it as feasible. 10/100 values is a good number.
Upvotes: 5