Reputation: 118
As a complete newbie to Python machine learning, I'm trying to train an XGBoost model to predict iris dataset (https://www.kaggle.com/uciml/iris).
I'm currently focusing on XGBoost, trying to gain some experience on it. My first model train, on 66% of dataset, 2 features only, never completed (Interrupted after 20 mins). I've also tried to make a very small sample out of it (5 samples, 2 features), but still it can't finish.
Environment Details: MacBook Pro 2017 with MacOS 10.14.5, Python 3.7.3 via Anaconda Navigator 1.9.7.
# File downloaded from Kaggle Link above
iris = pd.read_csv('Iris.csv')
iris['Species'] = iris.apply(lambda r: r['Species'][5:], axis = 1)
features = iris[['PetalLengthCm', 'PetalWidthCm']]
species, labels = pd.factorize(iris['Species'])
X_train, X_test, y_train, y_test = train_test_split(features, species, test_size=0.33, random_state=42)
xgb_x_train = X_train.head()
xgb_y_train = y_train[:5]
print(xgb_x_train.shape)
print(len(xgb_y_train))
(5, 2)
5
xgbclf = xgb.XGBClassifier()
xgbclf.fit(xgb_x_train, xgb_y_train)
I expect the code above to produce a trained model (Not fine tuned, due to only using 5 samples), in a "reasonable" time i.e. less than 4-5 mins, but the fit phase never completes.
Am I doing something extremely wrong that might cause this high fit times?
Thanks for every suggestion! Mattia
Upvotes: 3
Views: 1879
Reputation: 4162
Maybe you didn't install Xgboost properly (happened with me once in windows), I suggest try reinstalling using conda install
.
But for your case you can try uploading your code on google colab https://colab.research.google.com (they give you a free GPU and everything is already installed). This training should take only a few seconds.
Upvotes: 2