Reputation: 3385
I'm trying to fit a linear regression model using a greedy feature selection algorithm. To be a bit more specific, I have four sets of data:
X_dev
, y_dev
, X_test
, y_test
, the first two being the features and labels for the training set and the latter two for the test set. The size of the matrices are (900, 126)
, (900, )
, (100, 126)
, and (100, )
, respectively.
What I mean by "greedy feature selection" is that I would first fit 126 models using one feature each from the X_dev
set, choose the best one, then run models using the first one and each of the remaining 125 models. The selection continues until I have obtained 100 of the features that perform best among the original 126.
The problem I'm facing is regarding implementation in Python. The code that I have is for fitting a single feature first:
lin_reg.fit(X_dev[:, 0].reshape(-1, 1), y_dev)
lin_pred = lin_reg.predict(X_test)
Because the dimensions don't match ((100, 126)
and (1, )
) I'm getting a dimension mismatch error.
How should I fix this? I'm trying to predict how the model performs when using the single feature.
Thank you.
Upvotes: 1
Views: 408
Reputation: 1508
Apply the same transformation to X_test
lin_reg.fit(X_dev[:, 0].reshape(-1, 1), y_dev)
lin_pred = lin_reg.predict(X_test[:, 0].reshape(-1, 1))
I also don’t think the reshape is necessary.
lin_reg.fit(X_dev[:, 0], y_dev)
lin_pred = lin_reg.predict(X_test[:, 0])
Should work as well
Upvotes: 1