Reputation: 13
I would like to do a K-fold cross validation with sklearn in python.My data has 8 users and i only do K-fold on the data of one user.Is it possible to do cross validation between the users?For instance to use 7 users as a train set and 1 user as test set and do that for those 8 different occasions?
Upvotes: 1
Views: 401
Reputation: 2744
Yes, this is possible. You can use cross-validation with groups for this. If you want to make sure that data points from one person are in either the training or the testing set, this is called grouping or blocking. in scikit-learn, such a thing can be achieved by adding an array with group membership values to cross_val_scores
. Then you can use the GroupKFold
class of scikit-learn with the number of groups as Cross-validation procedure. See example below. (Simple logistic regression model just to illustrate usasge of the GroupKFold class)
from sklearn.model_selection import GroupKFold
# create synthetic dataset
X, y = make_blobs(n_samples=12, random_state=0)
# the first three samples belong to the same group, etc.
groups = [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3]
scores = cross_val_score(logreg, X, y, groups, cv=GroupKFold(n_splits=4))
print("cross_val_score(logreg, X, y, groups, cv=GroupKFold(n_splits=4)")
print("Cross-validation scores :\n{}".format(scores))
Upvotes: 3