Stuart C
Stuart C

Reputation: 175

Adaboost vs. Gaussian Naive Bayes

I'm new to Adaboost, but have been reading about it, and it seemed like the perfect solution for a problem I've been working on.

I have a data set where the classes are 'UP' and 'DOWN'. The Gaussian Naive Bayes classifier classifies both classes with ~55% accuracy (weakly accurate). I thought that using Adaboost with Gaussian Naive Bayes as my base estimator would allow me to get a greater accuracy, however when I do this, my accuracy drops to around 45-50%.

Why is this? I find it very unusual that Adaboost would underperform its base estimator. Additionally, any tips for getting Adaboost to work better would be appreciated. I have tried it with many different estimators with similar poor results.

Upvotes: 1

Views: 608

Answers (1)

Sir_Jingo
Sir_Jingo

Reputation: 46

The reason could be the Diversity dilemma of the Ensemble methods, which particularly concerns the Adaboost algorithm. Diversity is the error between the component classifiers of the Adaboost algorithm, which we prefer to keep uncorrelated. Otherwise, component classifiers will perform worse than single component classifiers. On the other hand, if we use weak base classifiers but achieve reasonable accuracy, the final ensemble will achieve higher accuracy.

This is well explained in this paper. From which we can retrieve this explanation:

Accuracy and Diversity dilemma of Adaboost

This diagram is a scatter-plot where each point corresponds to a component classifier. The x coordinate value of a point is the diversity value of the corresponding component classifier while the y coordinate value is the accuracy value of the corresponding component classifier. From this figure, it can be observed that, if the component classifiers are too accurate, it is difficult to find very diverse ones, and combining these accurate but non-diverse classifiers often leads to very limited improvement (Windeatt, 2005). On the other hand, if the component classifiers are too inaccurate, although we can find diverse ones, the combination result may be worse than that of combining both more accurate and diverse component classifiers. This is because if the combination result is dominated by too many inaccurate component classifiers, it will be wrong most of the time, leading to poor classification result

To directly answer your question, it may be that using the Guassian Naive Bayes as base estimators is creating classifiers that do not disagree (enough) with each other (diversify the error), hence Adaboost generalizes even worse than the single Gaussian Naive Bayes.

Upvotes: 0

Related Questions