Unexpected poor performance of AdaBoost compared to Random Forest

Question

I am working on a lithology identification project similar to the one described here.

So far the Random Forest method has yielded satisfying results. I decided to compare its performance with that of other algorithms, namely AdaBoost and Support Vector Machines.

I therefore modified my training model function as follows:

def train_model(training_data, features, labels, method):
    X=training_data[features]  # Features
    y=training_data[labels]  # Labels
    print(y.value_counts())
    if method == 'random_forest':
        clf = make_pipeline(preprocessing.StandardScaler(), RandomForestClassifier(n_estimators=100, min_samples_split = 5, min_samples_leaf = 4, max_depth = 10, bootstrap = True))
    elif method == 'adaboost':
        clf = make_pipeline(preprocessing.StandardScaler(), AdaBoostClassifier(n_estimators=200, random_state=0))
    elif method == 'svm':
        clf = make_pipeline(preprocessing.StandardScaler(), SVC(gamma='auto'))
    else:
        print("method not yet supported")
    print(method)
    print(cross_val_score(clf, X, y, cv=5))
    classifier = clf.fit(X,y)
    return clf

The function printed the following scores:

adaboost [0.48272892 0.52855543 0.79712267 0.62000345 0.50964852]

svm [0.73589456 0.77233181 0.67117505 0.69150586 0.76162991]

random_forest [0.74700663 0.81169782 0.71183666 0.702102 0.73664714]

I was surprised to see that the values returned by cross_val_score were much lower than those obtained using the two other methods. Does this make sense or is there an issue with the way I am calling the AdaBoost classifier in my pipeline: AdaBoostClassifier(n_estimators=200, random_state=0)?

Please not that I did not start tuning the hyperparameters through random search, but in my experience, the improvements associated with hyperparameter tuning are only marginal.

Unexpected poor performance of AdaBoost compared to Random Forest

Answers (0)

Related Questions