Reputation: 33
I am trying to implement a grid search using XGBoost and the Hyperopt library. But I run into the problem shown in the figure: at the 213th configuration, an out of memory error appears. Since my dataset is not very large, I doubt that it is an overload problem due to the data and not even about the parameters of the model on which I grid search. This is because the previous configurations also have more features or training points but the training has not stalled.
terminate called after throwing an instance of 'thrust::system::system_error'
what(): parallel_for failed: out of memory
Aborted (core dumped)
I think maybe it could be a problem related to an accumulation of GPU memory due to the various configurations tested and therefore it is necessary to release it from time to time. But I don't find anything about it.
Tell me yours, thank you.
Upvotes: 3
Views: 7780
Reputation: 103
This is a known issue with XGBoost and related to the section 'Memory Usage' in https://xgboost.readthedocs.io/en/latest/gpu/index.html.
Hyperopt is causing a large loop of XGB instances keeping data in memory. You have to free the memory by serializing your model/booster or deleting it after predictions are carried out.
See this XGB issue for workaround and more info: https://github.com/dmlc/xgboost/issues/4668
Upvotes: 3