Reputation: 7198
I am working on face recognition project where I have two person with 2 face each
1. personA
image1.jpg
image2.jpg
2. personB
image1.jpg
image2.jpg
I am trying to train the model on face embedding of above dataset like below:
params = {"C": [0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0], "gamma": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1]}
model = GridSearchCV(SVC(kernel="rbf", gamma="auto", probability=True), params, cv=3, n_jobs=-1)
model.fit(data["embeddings"], labels)
where lenght of data["embeddings"]
and labels
is 4
. data["embeddings']
contains the ndarray of face embedding of personA, personB
data['embeddings'] = [
[0.02331057, -0.01995077, ..],
[-0.00034041, 0.02753334, ..],
[0.02454563, -0.03797123, ...],
[0.10561685, -0.08444008, ...]
]
labels = [0 0 1 1]
But I am getting below error at model.fit(data["embeddings"], labels)
:
ValueError: n_splits=3 cannot be greater than the number of members in each class.
I am not able to understand this error. Can anyone please explain me this issue and how can I resolve it?
Upvotes: 0
Views: 4059
Reputation: 60319
In close reading, the error message is clear and self-explainable; it simply tells you that, since you have a total of only two (2) samples for each one of your classes, you cannot have a cross-validation with 3 folds. This would require at minimum 3 samples for each one of your classes.
I guess it should work with cv=2
without throwing any error, but your whole approach (i.e. a dataset with only 4 samples) seems highly questionable.
Upvotes: 2