Archie
Archie

Reputation: 2387

Grouped Cross-validation LassoCV scikit-learn

I am running into some weird errors using the LassoCV() regressor in combination with a grouped cross-validation object.

More specific, having dataframe df and target column y, I would like to perform LeaveOneGroupOut() cross-validation. If I run the following:

df = np.random.rand(100,50)
y = np.random.rand(100)
logo = LeaveOneGroupOut()
groups = np.random.randint(0,10,100)
lassoCV = linear_model.LassoCV(eps=0.0001, n_alphas=400, max_iter=200000, cv=logo, normalize=False, random_state=9) `

Running:

lassoCV.fit(df,y)

results in the error: ValueError: The 'groups' parameter should not be None.

If I run:

lassoCV.fit(df,y,groups)

I get the error: TypeError: fit() takes 3 positional arguments but 4 were given.

Seems to me that the second option would be the way to go. Did I implement something wrong? Or is this a bug in scikit-learn?

Upvotes: 1

Views: 1297

Answers (1)

Scratch'N'Purr
Scratch'N'Purr

Reputation: 10427

The groups error refers to the parameter in your LeaveOneGroupOut's split method. Per the documentation referenced here, the cv argument should be an iterable that yields train/test splits. Therefore, you just need to create the generator object using the split method.

gen_logo = logo.split(df, groups=groups)  # create your generator
lassoCV = linear_model.LassoCV(eps=0.0001, n_alphas=400, max_iter=200000, cv=gen_logo, normalize=False, random_state=9)  # pass it to the cv argument
lassoCV.fit(df, y)  # now fit

Upvotes: 3

Related Questions