Sheldon
Sheldon

Reputation: 4653

Unexpected poor performance of AdaBoost compared to Random Forest

I am working on a lithology identification project similar to the one described here.

So far the Random Forest method has yielded satisfying results. I decided to compare its performance with that of other algorithms, namely AdaBoost and Support Vector Machines.

I therefore modified my training model function as follows:

def train_model(training_data, features, labels, method):
    X=training_data[features]  # Features
    y=training_data[labels]  # Labels
    print(y.value_counts())
    if method == 'random_forest':
        clf = make_pipeline(preprocessing.StandardScaler(), RandomForestClassifier(n_estimators=100, min_samples_split = 5, min_samples_leaf = 4, max_depth = 10, bootstrap = True))
    elif method == 'adaboost':
        clf = make_pipeline(preprocessing.StandardScaler(), AdaBoostClassifier(n_estimators=200, random_state=0))
    elif method == 'svm':
        clf = make_pipeline(preprocessing.StandardScaler(), SVC(gamma='auto'))
    else:
        print("method not yet supported")
    print(method)
    print(cross_val_score(clf, X, y, cv=5))
    classifier = clf.fit(X,y)
    return clf    

The function printed the following scores:

adaboost [0.48272892 0.52855543 0.79712267 0.62000345 0.50964852]

svm [0.73589456 0.77233181 0.67117505 0.69150586 0.76162991]

random_forest [0.74700663 0.81169782 0.71183666 0.702102 0.73664714]

I was surprised to see that the values returned by cross_val_score were much lower than those obtained using the two other methods. Does this make sense or is there an issue with the way I am calling the AdaBoost classifier in my pipeline: AdaBoostClassifier(n_estimators=200, random_state=0)?

Please not that I did not start tuning the hyperparameters through random search, but in my experience, the improvements associated with hyperparameter tuning are only marginal.

Upvotes: 1

Views: 274

Answers (0)

Related Questions