Reputation: 31
I am using the scikit-learn
library for python for a classification problem. I used RandomForestClassifier
and a SVM (SVC class). However while the rf achieves about 66% precision and 68% recall the SVM only gets up to 45% each.
I did a GridSearch
for the parameters C and gamma for the rbf-SVM and also considered scaling and normalization in advance. However I think the gap between rf and SVM is still too large.
What else should I consider to get an adequate SVM performance?
I thought it should be possible to get at least up to equal results. (All the scores are obtained by cross-validation on the very same test and training sets.)
Upvotes: 3
Views: 1130
Reputation: 40149
As EdChum said in the comments there is no rule or guarantee that any model always perform best.
The SVM with RBF kernel model makes the assumption that the optimal decision boundary is smooth and rotation invariant (once you fix a specific feature scaling that is not rotation invariant).
The Random Forest does not make the smoothness assumption (it's a piece wise constant prediction function) and favors axis aligned decision boundaries.
The assumptions made by the RF model might just better fit the task.
BTW, thanks for having grid searched C
and gamma
and checked the impact of feature normalization before asking on stackoverflow :)
Edit to get some more insight, it might be interesting to plot the learning curves for the 2 models. It might be the case that the SVM model regularization and kernel bandwidth cannot deal with overfitting good enough while the ensemble nature of RF works best for this dataset size. The gap might get closer if you had more data. The learning curves plot is a good way to check how your model would benefit from more samples.
Upvotes: 4