Reputation: 17617
I would like to know how does sklearn.LassoCV perform cross validation. In particular I would like to know how are the samples subdivided in the folds. Is it a random or deterministic process?
For example suppose I have 100 samples and I use 10 folds cross validation and consider F the function which send every sample to its fold.
F(1:10)=1, F(11:20)=2,... or is it a random process ( for example F(1)=8, F(2)=7...)
Let me know if the question is not clear.
Thanks :)
Ok this is the solution:
from sklearn.linear_model import LassoCV
from sklearn.cross_validation import KFold
kf=KFold(len(y),n_folds=10,shuffle=True)
cv=LassoCV(cv=kf).fit(x,y)
Upvotes: 0
Views: 1887
Reputation: 46
I assume you're passing in the keyword arg cv=10
to the LassoCV
constructor?
If this is the case, then this will create a KFold
object with 10 folds: take a look at where check_cv
is called in LinearModelCV
(LassoCV
's parent).
KFold
takes a random_state
keyword argument (which defaults to None – so numpy.random
will try to seed on /dev/urandom
or something similar) – but if shuffle is False
(which it is by default), then random_state
doesn't actually do anything. The folds are selected from adjacent members in the data set.
If you want to randomise the folds, you should create a KFold
object with shuffle=True
, and use that object as the cv
keyword argument, instead of 10
.
Sources:
Upvotes: 3