Reputation: 2387
I am running into some weird errors using the LassoCV() regressor in combination with a grouped cross-validation object.
More specific, having dataframe df
and target column y
, I would like to perform LeaveOneGroupOut() cross-validation. If I run the following:
df = np.random.rand(100,50)
y = np.random.rand(100)
logo = LeaveOneGroupOut()
groups = np.random.randint(0,10,100)
lassoCV = linear_model.LassoCV(eps=0.0001, n_alphas=400, max_iter=200000, cv=logo, normalize=False, random_state=9) `
Running:
lassoCV.fit(df,y)
results in the error: ValueError: The 'groups' parameter should not be None.
If I run:
lassoCV.fit(df,y,groups)
I get the error: TypeError: fit() takes 3 positional arguments but 4 were given
.
Seems to me that the second option would be the way to go. Did I implement something wrong? Or is this a bug in scikit-learn?
Upvotes: 1
Views: 1297
Reputation: 10427
The groups
error refers to the parameter in your LeaveOneGroupOut
's split method. Per the documentation referenced here, the cv
argument should be an iterable that yields train/test splits. Therefore, you just need to create the generator object using the split
method.
gen_logo = logo.split(df, groups=groups) # create your generator
lassoCV = linear_model.LassoCV(eps=0.0001, n_alphas=400, max_iter=200000, cv=gen_logo, normalize=False, random_state=9) # pass it to the cv argument
lassoCV.fit(df, y) # now fit
Upvotes: 3