Reputation: 16461
Could someone please tell me whether the training sample sizes for each class need to be equal?
Can I take this scenario?
class1 class2 class3
samples 400 500 300
or should all the classes have equal sample sizes?
Upvotes: -1
Views: 7429
Reputation: 41488
The KNN results basically depend on 3 things (except for the value of N):
Consider the following example where you're trying to learn a donut-like shape in a 2D space.
By having a different density in your training data (let's say you have more training samples inside of the donut than outside), your decision boundary will be biased like below:
On the other hand, if your classes are relatively balanced, you'll get a much finer decision boundary that will be close to the actual shape of the donut:
So basically, I would advise trying to balance your dataset (just normalize it somehow), and also take in consideration the 2 other items I mentionned above, and you should be fine.
In case you have to deal with inbalanced training data, you could also consider using the WKNN algorithm (just an optimization of KNN) to assign stronger weights to your class that has less elements.
Upvotes: 7
Reputation: 14721
k nearest neighbor method does not depend on sample sizes. You can use your example sample sizes. For example see following paper on KDD99 data set with k-nearest neighbor. KDD99 is wildly imbalanced dataset more than your example dataset.
Upvotes: -1