Reputation: 815
I need to train a classifier with data whose dimensionality can vary. For example (and this is made-up date for illustration):
class-1,0,1,2,3
class-2,0,3,2,4,5,7
class-3,1,8,8,8,2,8,0,0,0
:
:
and so on...
I am trying to train a Linear SVM using scikit-learn
which requires the dimensionality to be fixed. A simple zero-padding of the smaller dims to match the dim of the largest, is giving me disappointing results.
Should I be using a different classifier for such data? How should I approach this?
Upvotes: 0
Views: 93
Reputation: 573
Feature hashing is the algorithm you need to use to convert your variable-length input into constant-length input. Then, you could use your transformed vectors with any appropiate learning algorithm.
Upvotes: 1
Reputation: 126
Try padding with feature mean/median, that's another way to deal with missing data. Are those measurements made in the same points/features ?
Upvotes: 1