Sklearn | LinearRegression | Fit

Question

I'm having a few issues with LinearRegression algorithm in Scikit Learn - I have trawled through the forums and Googled a lot, but for some reason, I haven't managed to bypass the error. I am using Python 3.5

Below is what I've attempted, but keep getting a value error:"Found input variables with inconsistent numbers of samples: [403, 174]"

X = df[["Impressions", "Clicks", "Eligible_Impressions", "Measureable_Impressions", "Viewable_Impressions"]].values

y = df["Total_Conversions"].values.reshape(-1,1)

print ("The shape of X is {}".format(X.shape))
print ("The shape of y is {}".format(y.shape))

The shape of X is (577, 5)
The shape of y is (577, 1)

X_train, y_train, X_test, y_test = train_test_split(X, y, test_size=0.3, random_state = 42)
linreg = LinearRegression()
linreg.fit(X_train, y_train)
y_pred = linreg.predict(X_test)
print (y_pred)

print ("The shape of X_train is {}".format(X_train.shape))
print ("The shape of y_train is {}".format(y_train.shape))
print ("The shape of X_test is {}".format(X_test.shape))
print ("The shape of y_test is {}".format(y_test.shape))

The shape of X_train is (403, 5)
The shape of y_train is (174, 5)
The shape of X_test is (403, 1)
The shape of y_test is (174, 1)

Am I missing something glaringly obvious?

Any help would be greatly appreciated.

Kind Regards, Adrian

Bob Haffner · Accepted Answer

Looks like your Train and Tests contain different number of rows for X and y. And its because you're storing the return values of train_test_split() in the incorrect order

Change this

X_train, y_train, X_test, y_test = train_test_split(X, y, test_size=0.3, random_state = 42)

To this

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state = 42)

Sklearn | LinearRegression | Fit

Answers (1)

Related Questions