Why is XGBoost fit slow, even with very small dataset?

Question

As a complete newbie to Python machine learning, I'm trying to train an XGBoost model to predict iris dataset (https://www.kaggle.com/uciml/iris).

I'm currently focusing on XGBoost, trying to gain some experience on it. My first model train, on 66% of dataset, 2 features only, never completed (Interrupted after 20 mins). I've also tried to make a very small sample out of it (5 samples, 2 features), but still it can't finish.

Environment Details: MacBook Pro 2017 with MacOS 10.14.5, Python 3.7.3 via Anaconda Navigator 1.9.7.

# File downloaded from Kaggle Link above
iris = pd.read_csv('Iris.csv')
iris['Species'] = iris.apply(lambda r: r['Species'][5:], axis = 1)

features = iris[['PetalLengthCm', 'PetalWidthCm']]
species, labels = pd.factorize(iris['Species'])

X_train, X_test, y_train, y_test = train_test_split(features, species, test_size=0.33, random_state=42)

xgb_x_train = X_train.head()
xgb_y_train = y_train[:5]

print(xgb_x_train.shape)
print(len(xgb_y_train))

(5, 2)
5

xgbclf = xgb.XGBClassifier()
xgbclf.fit(xgb_x_train, xgb_y_train)

I expect the code above to produce a trained model (Not fine tuned, due to only using 5 samples), in a "reasonable" time i.e. less than 4-5 mins, but the fit phase never completes.

Am I doing something extremely wrong that might cause this high fit times?

Thanks for every suggestion! Mattia

Why is XGBoost fit slow, even with very small dataset?

Answers (1)

Related Questions