Cynthia
Cynthia

Reputation: 397

Error when attempting cross validation in python

I am currently trying to implement cross validation with linear regression. The linear regression works, but when I try cross validation I get this error:

TypeError: only integer scalar arrays can be converted to a scalar index

I get this error on line 5 of my code.

Here is my code:

for train_index, test_index in kf.split(X):
    print("TRAIN:", train_index, "TEST:", test_index)
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    linreg.fit(X_train, Y_train)
    # p = np.array([linreg.predict(xi) for xi in x[test]])
    p = linreg.predict(X_test)
    e = p-Y_test
    xval_err += np.dot(e,e)

rmse_10cv = np.sqrt(xval_err/len(X_train))

Can someone please help me with this problem?

Thanks in advance!

Upvotes: 1

Views: 500

Answers (1)

Imran
Imran

Reputation: 13468

There are a few problems with your code.

In line 5 Y_train is not defined. I think you want the lowercase y_train.

Similarly you want e = p-y_test on line 8.

In rmse_10cv = np.sqrt(xval_err/len(X_train)) X_train is defined inside your loop, so it will take the value on the last iteration of your loop. Watch your output where to print your training indices for each fold to make sure the length of X_train is always the same, otherwise your calculation of rmse_10cv will not be valid.

I ran your code with the fixes I described and with the following before the loop:

import numpy as np
from sklearn.model_selection import KFold
from sklearn.linear_model import LinearRegression
X = X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
y = np.array([1, 2, 3, 4])
kf = KFold(n_splits=2)
linreg = LinearRegression()
xval_err = 0

and I did not receive any errors.

Upvotes: 1

Related Questions