S Andrew
S Andrew

Reputation: 7198

Python ValueError: n_splits=3 cannot be greater than the number of members in each class

I am working on face recognition project where I have two person with 2 face each

1. personA
    image1.jpg
    image2.jpg


2. personB
    image1.jpg
    image2.jpg

I am trying to train the model on face embedding of above dataset like below:

params = {"C": [0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0], "gamma": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]}
model = GridSearchCV(SVC(kernel="rbf", gamma="auto", probability=True), params, cv=3, n_jobs=-1)
model.fit(data["embeddings"], labels)

where lenght of data["embeddings"] and labels is 4. data["embeddings'] contains the ndarray of face embedding of personA, personB

data['embeddings'] = [
                         [0.02331057, -0.01995077, ..], 
                         [-0.00034041,  0.02753334, ..], 
                         [0.02454563, -0.03797123, ...], 
                         [0.10561685, -0.08444008, ...]
                     ]

labels = [0 0 1 1]

But I am getting below error at model.fit(data["embeddings"], labels):

ValueError: n_splits=3 cannot be greater than the number of members in each class.

I am not able to understand this error. Can anyone please explain me this issue and how can I resolve it?

Upvotes: 0

Views: 4059

Answers (1)

desertnaut
desertnaut

Reputation: 60319

In close reading, the error message is clear and self-explainable; it simply tells you that, since you have a total of only two (2) samples for each one of your classes, you cannot have a cross-validation with 3 folds. This would require at minimum 3 samples for each one of your classes.

I guess it should work with cv=2 without throwing any error, but your whole approach (i.e. a dataset with only 4 samples) seems highly questionable.

Upvotes: 2

Related Questions