Reputation: 276
I'm new to machine learning and created a logistic model using sklearn but i don't get any documentation on how to find P-value for my feature variables as well as model. I have checked the stack link but don't get the required output. please help. Thanks in advance
Upvotes: 7
Views: 25049
Reputation: 24623
One can use regressors
package for this. Following code is from: https://regressors.readthedocs.io/en/latest/usage.html
import numpy as np
from sklearn import datasets
boston = datasets.load_boston()
which_betas = np.ones(13, dtype=bool)
which_betas[3] = False # Eliminate dummy variable
X = boston.data[:, which_betas]
y = boston.target
from sklearn import linear_model
from regressors import stats
ols = linear_model.LinearRegression()
ols.fit(X, y)
# To calculate the p-values of beta coefficients:
print("coef_pval:\n", stats.coef_pval(ols, X, y))
# to print summary table:
print("\n=========== SUMMARY ===========")
xlabels = boston.feature_names[which_betas]
stats.summary(ols, X, y, xlabels)
Output:
coef_pval:
[2.66897615e-13 4.15972994e-04 1.36473287e-05 4.67064962e-01
1.70032518e-06 0.00000000e+00 7.67610259e-01 1.55431223e-15
1.51691918e-07 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00]
=========== SUMMARY ===========
Residuals:
Min 1Q Median 3Q Max
-26.3743 -1.9207 0.6648 2.8112 13.3794
Coefficients:
Estimate Std. Error t value p value
_intercept 36.925033 4.915647 7.5117 0.000000
CRIM -0.112227 0.031583 -3.5534 0.000416
ZN 0.047025 0.010705 4.3927 0.000014
INDUS 0.040644 0.055844 0.7278 0.467065
NOX -17.396989 3.591927 -4.8434 0.000002
RM 3.845179 0.272990 14.0854 0.000000
AGE 0.002847 0.009629 0.2957 0.767610
DIS -1.485557 0.180530 -8.2289 0.000000
RAD 0.327895 0.061569 5.3257 0.000000
TAX -0.013751 0.001055 -13.0395 0.000000
PTRATIO -0.991733 0.088994 -11.1438 0.000000
B 0.009827 0.001126 8.7256 0.000000
LSTAT -0.534914 0.042128 -12.6973 0.000000
---
R-squared: 0.73547, Adjusted R-squared: 0.72904
F-statistic: 114.23 on 12 features
Upvotes: 7