emax
emax

Reputation: 7255

Python: sklearn kFold return multiple values for keyword argument 'shuffle'

I am trying to perform cross validation with classic kfold using sklearn

def train_and_evaluate(clf, X_train, y_train):
    clf.fit(X_train, y_train)
    # create a k-fold cross validation iterator of k=5 folds
    cv = KFold(int(X_train.shape[0]), 4, shuffle = True)  ## Classic KFold
    scores = cross_val_score(clf, X_train, y_train, cv=cv)
    return (clf, scores) 

X_train, X_test, y_train, y_test =  train_test_split(X, Y, test_size=0.20, random_state=42)
scaler  = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test  = scaler.transform(X_test)

but I am getting the following error:

clf1, scores1 = train_and_evaluate(linear_model.SGDRegressor(), X_train, y_train)

TypeError: __init__() got multiple values for keyword argument 'shuffle'

Upvotes: 0

Views: 2053

Answers (2)

Harsha ganesh pspk
Harsha ganesh pspk

Reputation: 1

import numpy as np

x=np.arange(100)

from  sklearn.model_selection import KFold

kf=KFold(5,shuffle=True,random_state=None)

x=kf.split(X)

for i,j in x:

    print(i,j)

Upvotes: 0

Sevy
Sevy

Reputation: 698

The function signature for KFold looks like this

sklearn.model_selection.KFold(n_splits=3, shuffle=False, random_state=None)

so when you pass those two positional arguments (int(X_train.shape[0]), 4) you are passing 4 for the argument shuffle. You then pass shuffle by name as well, so that's how you get the multiple arguments error.

I'm not super clear on why you are passing these two positional arguments, but I think if you want a 4 fold split, you only need to pass 4

Upvotes: 1

Related Questions