Sean
Sean

Reputation: 3385

Python Fitting Linear Regression using Greedy Feature Selection

I'm trying to fit a linear regression model using a greedy feature selection algorithm. To be a bit more specific, I have four sets of data:

X_dev, y_dev, X_test, y_test, the first two being the features and labels for the training set and the latter two for the test set. The size of the matrices are (900, 126), (900, ), (100, 126), and (100, ), respectively.

What I mean by "greedy feature selection" is that I would first fit 126 models using one feature each from the X_dev set, choose the best one, then run models using the first one and each of the remaining 125 models. The selection continues until I have obtained 100 of the features that perform best among the original 126.

The problem I'm facing is regarding implementation in Python. The code that I have is for fitting a single feature first:

lin_reg.fit(X_dev[:, 0].reshape(-1, 1), y_dev)
lin_pred = lin_reg.predict(X_test)

Because the dimensions don't match ((100, 126) and (1, )) I'm getting a dimension mismatch error.

How should I fix this? I'm trying to predict how the model performs when using the single feature.

Thank you.

Upvotes: 1

Views: 408

Answers (1)

John R
John R

Reputation: 1508

Apply the same transformation to X_test

lin_reg.fit(X_dev[:, 0].reshape(-1, 1), y_dev)
lin_pred = lin_reg.predict(X_test[:, 0].reshape(-1, 1))

I also don’t think the reshape is necessary.

lin_reg.fit(X_dev[:, 0], y_dev)
lin_pred = lin_reg.predict(X_test[:, 0])

Should work as well

Upvotes: 1

Related Questions