itjcms18
itjcms18

Reputation: 4333

cross validation in sklearn with given fold splits

I'm learning how to use Lasso and Ridge with sklearn in Python. I am given the folds in a column. I want to find the best parameter based on a 5 fold cross validation.

My data looks like the following:

    mpg cylinders  displacement  horsepower  weight  acceleration  origin  fold
0   18          8           307         130    3504          12.0       1     3
1   15          8           350         165    3693          11.5       1     0
2   18          8           318         150    3436          11.0       1     2
3   16          8           304         150    3433          12.0       1     2
4   17          8           302         140    3449          10.5       1     3


reg_para = [0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100]

mpg is the y/target variable and the other columns are the predictor. the last column contains the folds. I want to run a Lasso and Ridge and find the best parameter. The problem I am having is incorporating the specified folds in cross validation. Here is what I have so far (for Lasso):

from sklearn.linear_model import Lasso, LassoCV
lasso_model = LassoCV(cv=5, alphas=reg_para)
lasso_fit = lasso_model.fit(X,y)

Is there a simple way to incorporate the fold splits? Any help is greatly appreciated

Upvotes: 1

Views: 911

Answers (1)

eickenberg
eickenberg

Reputation: 14377

If your data are in a pandas dataframe, then all you need to do is access that column

fold_labels = df["fold"]
from sklearn.cross_validation import LeaveOneLabelOut
cv = LeaveOneLabelOut(fold_labels)

lasso_model = LassoCV(cv=cv, alphas=reg_para)

So if you obtain the fold labels in an array fold_labels you can just use LeaveOneLabelOut (sorry for the non-functional code. It should be sufficient to elucidate the idea though.)

Upvotes: 1

Related Questions