Best Learning model for high numerical dimension data? (with Rapidminer)

Question

I have a dataset of approx. 4800 rows with 22 attributes, all numerical, describing mostly the geometry of rock / minerals, and 3 different classes.

I tried out a cross validation with k-nn Model inside it, with k= 7 and Numerical Measure -> Camberra Distance as parameters set..and I got a performance of 82.53% and 0.673 kappa. Is that result representative for the dataset? I mean 82% is quite ok..

Before doing this, I evaluated the best subset of attributes with a decision table, I got out 6 different attributes for that.

the problem is, you still don't learn much from that kind of models, like instance-based k-nn. Can I get any more insight from knn? I don't know how to visualize the clusters in that high dimensional space in Rapidminer, is that somehow possible? I tried decision tree on the data, but I got too much branches (300 or so) and it looked all too messy, the problem is, all numerical attributes have about the same mean and distribution, therefore its hard to get a distinct subset of meaningful attributes...

ideally, the staff wants to "Learn" something about the data, but my impression is, that you cannot learn much meaningful of that data, all that works best is "Blackbox" Learning models like Neural Nets, SVM, and those other instance-based models... how should I proceed?

Best Learning model for high numerical dimension data? (with Rapidminer)

Answers (1)

Related Questions