Vallamkonda Neelima
Vallamkonda Neelima

Reputation: 227

ValueError: x and y must have same first dimension in linear regression in python

I wrote a linear regression model with a single variable, but it raises a value error after running the following code

import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression as lr
import numpy as np


x=np.array([0,1,2,3,4,5,6,7,8,9])
y=np.array([1,3,2,5,7,8,8,9,10,12])

reg=lr().fit(x.reshape(10,1),y.reshape(10,1))

y_l = reg.intercept_ +  reg.coef_ *x
plt.plot(x,y_l)
plt.show()

I reshaped the numpy array x by using x.reshape(10,1) in the linear equation. Then it did not raise any value error. But I don't know the reason behind this.

import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression as lr
import numpy as np

x=np.array([0,1,2,3,4,5,6,7,8,9])
y=np.array([1,3,2,5,7,8,8,9,10,12])

reg=lr().fit(x.reshape(10,1),y.reshape(10,1))

y_l = reg.intercept_ +  reg.coef_ *x.reshape(10,1)
plt.plot(x,y_l)
plt.show()

Can anyone help me with this? Thanks in advance.

Upvotes: 1

Views: 246

Answers (2)

Celius Stingher
Celius Stingher

Reputation: 18367

This happens because of multiplying the np.array with the 2D array reg.coef_ with length (n_features). In order to multiply these elements, you need to either reshape the np.array or reshape the 2D array reg.coef_ into a similar fashion.

This should also work:

import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression as lr
import numpy as np

x=np.array([0,1,2,3,4,5,6,7,8,9])
y=np.array([1,3,2,5,7,8,8,9,10,12])

reg=lr().fit(x.reshape(10,1),y.reshape(10,1))

y_l = reg.intercept_ +  reg.coef_.reshape(1)*x
plt.plot(x,y_l)
plt.show()
print(reg.coef_.shape)

Upvotes: 0

jfaccioni
jfaccioni

Reputation: 7509

reg.coef_ is a 2D array - with shape (1, 1) in this case. it's always 2D in order to account for multiple coefficients when using multiple linear regression.

Broadcasting rules makes the expression reg.coef_ * x return a 2D array, resulting in the error you see.

In your case, I'd say the cleanest expression to fix this is:

y_l = reg.intercept_ +  reg.coef_.reshape(1) * x

Upvotes: 1

Related Questions