spiralarchitect
spiralarchitect

Reputation: 910

Memory clean up of Dask workers

I am running multiple parallel tasks on a multi-node distributed Dask cluster. However, once the tasks are finished, workers still hold large memory and cluster gets filled up soon.

I have tried client.restart() after every task and client.cancel(df), the first one kills workers and sends CancelledError to other running tasks which is troublesome and second one did not help much because we use a lot of custom objects and functions inside Dask's map functions. Adding del for known variables and gc.collect() also doesn't help much.

I am sure most of the memory held up is because of custom python functions and objects called with client.map(..).

My questions are:

  1. Is there a way from command-line or other wise which is like trigger worker restart if no tasks are running right now?
  2. If not, what are the possible solutions to this problem? It will be impossible for me to avoid custom objects and pure python functions inside Dask tasks.

Upvotes: 6

Views: 3628

Answers (1)

MRocklin
MRocklin

Reputation: 57271

If there are no references to futures then Dask should delete any references to Python objects that you've created with it. See https://www.youtube.com/watch?v=MsnzpzFZAoQ for more information on how to investigate this.

If your custom Python code does have some memory leak of its own then yes, you can Ask Dask workers to periodically restart themselves. See the dask-worker --help man page and look for keywords that start with --lifetime

Upvotes: 3

Related Questions