flair91023
flair91023

Reputation: 51

classification with large number of classes

Let us say I have a training dataset of 10 million images containing images of 100,000 different people. I want to create an ML model that can identify which person is in a given image. What would be the best approach considering the huge number of people(classes) ?

Upvotes: 5

Views: 8986

Answers (4)

Vedthedataguy
Vedthedataguy

Reputation: 17

There is another way to solve this problem. Convert each image into an embedding vector and then you can use any distance-based measure to classify which image/person is closest. This method was implemented in FaceNet model. Please read about facenet model to know more. This is a face verification problem.

Upvotes: 1

JYP
JYP

Reputation: 61

One possible approach is to treat this as a verification problem instead of multi-classification. That is, train a binary classifier for each person. You can also consult this paper: https://arxiv.org/abs/1503.03832

Upvotes: 6

npcompleted
npcompleted

Reputation: 31

The number of categories a classifier could classify with good precision/recall is decided by (but not limited to):

how distinct each category is?

how many features you could derive from the content (short text definitely carries much less information here than images) -- since you are using CNN for text, I assume the features would be merely characters or words.

How these features work to differentiate between categories?

how many high-quality labeled examples you have? (We don't have a public labeled large multi-category dataset for short text)

It's hard to just give you a number without knowing the answers to above questions

Upvotes: 2

Irfan Umar
Irfan Umar

Reputation: 197

Try some Boosting algorithms i.e. LightGBM, XGBOOST They are made for such large datasets.

Upvotes: 1

Related Questions