Reputation: 187
I have a l2-dimensional data set of 1000 samples composed of 5 temperature values, 5 price values, one integer value representing a judgement by a human expert (undecided=0, good=1, bad=2, danger=4) and a binary decision variable that I want to learn to predict.
How can I find a classifier than can cope with this heterogeneous data ?
I was thinking about building one classifier for each possible human judgement (0,1,2,4), so 4 classifiers. So for each human judgement value, I would: - center and reduce the temperature and price values - maybe use PCA to remove some irrelevant features - use a machine learning method for classification (like multi layers neural networks or SVM)
Is my approach correct ? (what if there were 1000 possible human judgements instead of 4 ?)
Upvotes: 1
Views: 959
Reputation: 3098
A typical way of encoding categories for SVMs or ANNs is the 1-of-C encoding:
Generally almost every classifier can deal with heterogeneous data. But you have to preprocess the inputs (scale, normalize, ...). There should be plenty of hints in the links I gave you.
Upvotes: 2