Reputation: 367
I was trying to run a Ridge Regression, just like this:
from sklearn.linear_model import LinearRegression, RidgeCV, Ridge
from regressors import stats
alphas = np.linspace(.00001, 100, 1000)
rr_scaled = RidgeCV(alphas= alphas, cv=5, normalize=True)
rr_scaled.fit(X_train, y_train)
It works fine, so I went to get the summary:
stats.summary(rr_scaled, X_train, y_train)
But I keep falling into this error:
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 1 and the array at index 1 has size 10
What's that? Is anything wrong with the syntax?
I found this post: p-values from ridge regression in python but it's exactly like what I was doing. And in the post is working!
Upvotes: 1
Views: 392
Reputation: 323
The problem seems to be that regressors expects your data to be in a particular shape. In particular, it seems to expect your target variable to be an array, instead of a matrix.
Consider the following example, which is based on your code:
import numpy as np
import pandas as pd
from regressors import stats
from sklearn.linear_model import LinearRegression, RidgeCV, Ridge
n_features = 3
n_samples = 10
X_train = np.random.normal(0, 1, size=(n_samples, n_features))
y_train = np.random.randn(n_samples)
alphas = np.linspace(.00001, 100, 1000)
rr_scaled = RidgeCV(alphas=alphas, cv=5, normalize=True)
rr_scaled.fit(X_train, y_train)
stats.summary(rr_scaled, X_train, y_train)
If I run it, it executes fine and outputs
Residuals:
Min 1Q Median 3Q Max
-2.5431 -0.8815 -0.0059 0.69 2.2218
Coefficients:
Estimate Std. Error t value p value
_intercept 0.213519 0.463767 0.4604 0.656149
x1 0.001617 0.761174 0.0021 0.998351
x2 0.006398 0.895701 0.0071 0.994457
x3 -0.003119 0.518982 -0.0060 0.995335
---
R-squared: 0.00267, Adjusted R-squared: -0.49599
F-statistic: 0.01 on 3 features
Now, if I change the target to a "matrix" shape:
y_train = np.random.randn(n_samples).reshape((-1, 1))
I get the same error you got:
Traceback (most recent call last):
File "a.py", line 16, in <module>
stats.summary(rr_scaled, X_train, y_train)
File "lib/python3.8/site-packages/regressors/stats.py", line 252, in summary
coef_df['Estimate'] = np.concatenate(
File "<__array_function__ internals>", line 5, in concatenate
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 1 and the array at index 1 has size 3
So, in your particular case, you probably need to do this:
y_train = y_train.reshape((-1,))
Upvotes: 1