Reputation: 45
I am working on a ML project and I keep getting this error code when I run my Stochastic Gradient Descent code. Does anyone know why this code occurs or how to fix it? I can provide more information if needed.
My train/test split code:
from sklearn.model_selection import train_test_split
data = mpsc["Sample location (Urban, Rural, Remote)"]
target = mpsc["Total Filament"]
X = data
y = target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
X_train
Output for X_train
:
523 2
492 3
767 3
440 2
318 1
..
838 2
965 1
518 2
75 3
564 1 Name: Sample location (Urban, Rural, Remote), Length: 806, dtype: object
Output for y_train
:
523 0
492 2
767 0
440 0
318 0
..
838 0
965 1
518 0
75 3
564 1 Name: Total Filament, Length: 806, dtype: int64
My SGD code:
from sklearn.linear_model import SGDRegressor
sgd_reg = SGDRegressor(max_iter=1000, tol=1e-3, penalty=None, eta0=0.1)
sgd_reg.fit(X, y.ravel())
My error code:
ValueError: Expected 2D array, got 1D array instead: array=[1. 2. 1. ... 2. 2. 3.]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Upvotes: 0
Views: 429
Reputation: 784
As the error message suggest, you should reshape your X
with X = X.reshape(-1, 1)
.
You’re getting this error because your X
is a 1D array, which you can think of as a Python list. Think of it as [row_1_x, row_2_x, ...]
. But sklearn expects a 2D array, a one dimensional vector of multiple features for each row in your data, [[row_1_x], [row_2_x], ...]
Upvotes: 3