Peter Schneider
Peter Schneider

Reputation: 31

SVM poor performance compared to Random Forest

I am using the scikit-learn library for python for a classification problem. I used RandomForestClassifier and a SVM (SVC class). However while the rf achieves about 66% precision and 68% recall the SVM only gets up to 45% each.

I did a GridSearch for the parameters C and gamma for the rbf-SVM and also considered scaling and normalization in advance. However I think the gap between rf and SVM is still too large.

What else should I consider to get an adequate SVM performance?

I thought it should be possible to get at least up to equal results. (All the scores are obtained by cross-validation on the very same test and training sets.)

Upvotes: 3

Views: 1130

Answers (1)

ogrisel
ogrisel

Reputation: 40149

As EdChum said in the comments there is no rule or guarantee that any model always perform best.

The SVM with RBF kernel model makes the assumption that the optimal decision boundary is smooth and rotation invariant (once you fix a specific feature scaling that is not rotation invariant).

The Random Forest does not make the smoothness assumption (it's a piece wise constant prediction function) and favors axis aligned decision boundaries.

The assumptions made by the RF model might just better fit the task.

BTW, thanks for having grid searched C and gamma and checked the impact of feature normalization before asking on stackoverflow :)

Edit to get some more insight, it might be interesting to plot the learning curves for the 2 models. It might be the case that the SVM model regularization and kernel bandwidth cannot deal with overfitting good enough while the ensemble nature of RF works best for this dataset size. The gap might get closer if you had more data. The learning curves plot is a good way to check how your model would benefit from more samples.

Upvotes: 4

Related Questions