Reputation: 2156
I am using Dask-ML to run some code which uses quite a bit of RAM memory during training. The training dataset itself is not large but it's during training which uses a fair bit of RAM memory. I keep getting the following error message, even though I have tried using different values for n_jobs
:
distributed.nanny - WARNING - Worker exceeded 95% memory budget. Restarting
What can I do?
Ps: I have also tried using Kaggle Kernel (which allows up to 16GB RAM) and this didn't work. So I am trying Dask-ML now. I am also just connected to the Dask cluster using its default parameter values, with the code below:
from dask.distributed import Client
import joblib
client = Client()
with joblib.parallel_backend('dask'):
# My own codes
Upvotes: 0
Views: 622
Reputation: 1464
Dask has a detailed page on techniques to help with memory management. You might also be interested in configuring spilling to disk Dask workers. For example, rather
Upvotes: 1