Error: Shapes (1,4) and (14,14) not aligned

Question

So I'm a newbie to Machine Learning and a little baffled by this error:

Shapes (1,4) and (14,14) not aligned: 4 (dim 1) != 14 (dim 0)

Here is the full error:

File "/Users/jim/anaconda3/lib/python3.6/site-packages/sklearn/utils/extmath.py", line 140, in safe_sparse_dot return np.dot(a, b)

ValueError: shapes (1,4) and (14,14) not aligned: 4 (dim 1) != 14 (dim 0)

My test set has 4 rows of data and training set 14 rows of data, as indicated by (1,4) and (14,14). At least I think that's what that means.

I'm trying to fit a simple linear regression to a Training set as indicated by my code below:

# Fit Simple Linear Regression to Training Set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
X_train = X_train.reshape(1,-1)
y_train = y_train.reshape(1,-1)
regressor.fit(X_train, y_train)

Then predict the Test Set Results:

# Predicting the Test Set Results
X_test = X_test.reshape(1,-1)
y_pred = regressor.predict(X_test)

My code is failing on the last line with the above error:

y_pred = regressor.predict(X_test)

Any hints in the right direction would be great.

Here is my whole code sample:

# Simple Linear Regression

# Import Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Import dataset
dataset = pd.read_csv('NBA.csv')
X = dataset.iloc[:, 1].values
y = dataset.iloc[:, :-1].values

# Splitting the dataset into Train and Test
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# Feature Scaling
# None

# Fit Simple Linear Regression to Training Set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
X_train = X_train.reshape(1,-1)
y_train = y_train.reshape(1,-1)
regressor.fit(X_train, y_train)

# Predicting the Test Set Results
X_test = X_test.reshape(1,-1)
y_pred = regressor.predict(X_test)

** EDIT ** I checked the shape of X and y. Here is my output below:

dataset = pd.read_csv('NBA.csv')
X = dataset.iloc[:, 1].values
y = dataset.iloc[:, :-1].values
print(X.shape)
print(y.shape)
-->(18,)
-->(18, 1)

dkato · Accepted Answer

Please replace reshape(1,-1) to reshape(-1, 1) for all usages. The former transforms an array into (1 person x n features) and the latter does (n persons x 1 feature). feature is hight, in this case.

If you modified import section as below, there is no need to reshape the array since their shapes are already satisfy the form of (n persons x 1 feature).

# Import dataset
dataset = pd.read_csv('NBA.csv')
X = dataset.iloc[:, 1].values
y = dataset.iloc[:, 0].values

X = X.reshape(-1, 1)
y = y.reshape(-1, 1)

In an early age of the sklearn, you can feed vector as inputs. But recently it has changed and now you need to explicitly indicate whether the vector is (1 sample x n features) or (n samples x 1 feature) by using reshape or some other methods.

Error: Shapes (1,4) and (14,14) not aligned

Answers (1)

Related Questions