Reputation: 747
dataset = pd.read_excel('dfmodel.xlsx')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
from sklearn.metrics import r2_score
print('The R2 score of Multi-Linear Regression model is: ',r2_score(y_test,y_pred))
With the code above, I managed to do a linear regression and get the R2. How do I get the beta coefficients of each predictor variable?
Upvotes: 1
Views: 15063
Reputation: 183
Personally, I prefer the single step of np.polyfit() with 1 degree specified.
import numpy as np
np.polyfit(X,y,1)[0] #returns beta + other coeffs if > 1 degree.
so your question, if I'm understanding, your looking to calculate the predicted y values against the initial y -- would be this:
np.polyfit(y_test,y_pred,1)[0]
I would test np.polyfit(x_test,y_pred)[0] instead though.
Upvotes: 2
Reputation: 21274
Use regressor.coef_
. You can see how these coefficients map on in order of the predictor variables by comparing against a statsmodels
implementation:
from sklearn.linear_model import LinearRegression
regressor = LinearRegression(fit_intercept=False)
regressor.fit(X, y)
regressor.coef_
# array([0.43160901, 0.42441214])
statsmodels
version:
import statsmodels.api as sm
sm.add_constant(X)
mod = sm.OLS(y, X)
res = mod.fit()
print(res.summary())
OLS Regression Results
=======================================================================================
Dep. Variable: y R-squared (uncentered): 0.624
Model: OLS Adj. R-squared (uncentered): 0.623
Method: Least Squares F-statistic: 414.0
Date: Tue, 29 Sep 2020 Prob (F-statistic): 1.25e-106
Time: 17:03:27 Log-Likelihood: -192.54
No. Observations: 500 AIC: 389.1
Df Residuals: 498 BIC: 397.5
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
x1 0.4316 0.041 10.484 0.000 0.351 0.512
x2 0.4244 0.041 10.407 0.000 0.344 0.505
==============================================================================
Omnibus: 36.830 Durbin-Watson: 1.967
Prob(Omnibus): 0.000 Jarque-Bera (JB): 13.197
Skew: 0.059 Prob(JB): 0.00136
Kurtosis: 2.213 Cond. No. 2.57
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
You can do a direct equivalency test with:
np.array([regressor.coef_.round(8) == res.params.round(8)]).all() # True
Upvotes: 0
Reputation: 434
From sklearn.linear_model.LinearRegression documentation page you can find the coefficients (slope) and intercept at regressor.coef_
and regressor.intercept_
respectively.
If you use sklearn.preprocessing.StandardScaler before fitting your model then the regression coefficients should be the Beta coefficients you're looking for.
Upvotes: 1