Reputation: 3621
I have a dask
Client
with its workers.
And I want ho make my calculation in two steps:
1) Run the pre-calculation code (eats small settings objects, calculates slowly and generates rather big intermediate structures) one time per each worker, save the intermediate data on each worker.
2) Run the calculation function (much faster than pre-calculation, runs many times per each worker, and uses intermediate data saved on each worker).
How can I do this?
Upvotes: 1
Views: 67
Reputation: 28684
You do not need to do anything special for this to take place. Dask takes pains to schedule tasks to happen on workers where the data required by those tasks already resides. There are also heuristics in place comparing the size of the data, the transfer speed any work backlog to decide when it might be worthwhile to copy data to another worker.
Unless you are hitting specific problems with work distribution, it is likely that simply doing the normal thing: writing functions which depend on inputs using either delayed
, the collections or the futures interface, things will be sensibly scheduled for you.
https://distributed.readthedocs.io/en/latest/locality.html
Upvotes: 1