Reputation: 566
Working with Julia notebook on Sagemaker:
ml.m5d.24xlarge
with 500GB
memory.
I'm training an XGBoost with 230 features (500MB per file on avg). It trains without an issue upto 205 files, but afterwards, randomly I get this error
> ┌ Info: Starting XGBoost training
└ num_boost_rounds = 99
ERROR: LoadError: Call to XGBoost C function XGBoosterUpdateOneIter failed: std::bad_alloc
Stacktrace:
[1] error(::String, ::String, ::String, ::String)
@ Base ./error.jl:42
[2] XGBoosterUpdateOneIter(handle::Ptr{Nothing}, iter::Int32, dtrain::Ptr{Nothing})
@ XGBoost ~/.julia/packages/XGBoost/fI0vs/src/xgboost_wrapper_h.jl:11
[3] #update#21
@ ~/.julia/packages/XGBoost/fI0vs/src/xgboost_lib.jl:204 [inlined]
[4] xgboost(data::XGBoost.DMatrix, nrounds::Int64; label::Type, param::Vector{Any}, watchlist::Vector{Any}, metrics::Vector{String}, obj::Type, feval::Type, group::Vector{Any}, kwargs::Base.Iterators.Pairs{Symbol, Any, NTuple{15, Symbol}, NamedTuple{(:objective, :num_class, :num_parallel_tree, :eta, :gamma, :max_depth, :min_child_weight, :max_delta_step, :subsample, :colsample_bytree, :lambda, :alpha, :tree_method, :grow_policy, :max_leaves), Tuple{String, Int64, Int64, Float64, Float64, Int64, Int64, Int64, Float64, Float64, Int64, Int64, String, String, Int64}}})
@ XGBoost ~/.julia/packages/XGBoost/fI0vs/src/xgboost_lib.jl:185
[5] macro expansion
@ /home/src/Training.jl:175 [inlined]
[6] macro expansion
@ ./timing.jl:210 [inlined]
Not sure how to fix it. The AWS instance has maximum CPU memory. Also, already using 99 procs/workers.
Upvotes: 2
Views: 1320
Reputation: 2826
This looks like you're trying to allocate more memory than what is available on the machine.
Unfortunately not much to do here other than sub-sample your dataset or try a larger instance.
An alternative is to try distributed training, using something like Dask: https://xgboost.readthedocs.io/en/stable/tutorials/dask.html
Upvotes: 1