Reputation: 910
I am running multiple parallel tasks on a multi-node distributed Dask cluster. However, once the tasks are finished, workers still hold large memory and cluster gets filled up soon.
I have tried client.restart()
after every task and client.cancel(df)
, the first one kills workers and sends CancelledError
to other running tasks which is troublesome and second one did not help much because we use a lot of custom objects and functions inside Dask's map
functions. Adding del
for known variables and gc.collect()
also doesn't help much.
I am sure most of the memory held up is because of custom python functions and objects called with client.map(..)
.
My questions are:
trigger worker restart if no tasks are running right now
?Upvotes: 6
Views: 3628
Reputation: 57271
If there are no references to futures then Dask should delete any references to Python objects that you've created with it. See https://www.youtube.com/watch?v=MsnzpzFZAoQ for more information on how to investigate this.
If your custom Python code does have some memory leak of its own then yes, you can Ask Dask workers to periodically restart themselves. See the dask-worker --help
man page and look for keywords that start with --lifetime
Upvotes: 3