Reputation: 4653
I am working on a lithology identification project similar to the one described here.
So far the Random Forest method has yielded satisfying results. I decided to compare its performance with that of other algorithms, namely AdaBoost and Support Vector Machines.
I therefore modified my training model function as follows:
def train_model(training_data, features, labels, method):
X=training_data[features] # Features
y=training_data[labels] # Labels
print(y.value_counts())
if method == 'random_forest':
clf = make_pipeline(preprocessing.StandardScaler(), RandomForestClassifier(n_estimators=100, min_samples_split = 5, min_samples_leaf = 4, max_depth = 10, bootstrap = True))
elif method == 'adaboost':
clf = make_pipeline(preprocessing.StandardScaler(), AdaBoostClassifier(n_estimators=200, random_state=0))
elif method == 'svm':
clf = make_pipeline(preprocessing.StandardScaler(), SVC(gamma='auto'))
else:
print("method not yet supported")
print(method)
print(cross_val_score(clf, X, y, cv=5))
classifier = clf.fit(X,y)
return clf
The function printed the following scores:
adaboost [0.48272892 0.52855543 0.79712267 0.62000345 0.50964852]
svm [0.73589456 0.77233181 0.67117505 0.69150586 0.76162991]
random_forest [0.74700663 0.81169782 0.71183666 0.702102 0.73664714]
I was surprised to see that the values returned by cross_val_score
were much lower than those obtained using the two other methods. Does this make sense or is there an issue with the way I am calling the AdaBoost classifier in my pipeline: AdaBoostClassifier(n_estimators=200, random_state=0)
?
Please not that I did not start tuning the hyperparameters through random search, but in my experience, the improvements associated with hyperparameter tuning are only marginal.
Upvotes: 1
Views: 274