Reputation: 2074
Each block in Dask array A leads to a block in each of Dask arrays B0, B1, ... Bn. The B arrays are subsequently saved to disk as zarr.
For each block in A, the corresponding blocks in B together exceeds the memory of a computing node. The memory can hold one block of one B array.
If I am just using a single computer to calculate the B arrays, will Dask recalculate each blocks of A to keep the live blocks of the B arrays under memory limit?
Can I hint the memory usage of each task to the scheduler?
Upvotes: 1
Views: 35
Reputation: 57281
As of today Dask will never recalculate a task for memory reasons. It will store excess data to disk (assuming you are using the dask.distributed scheduler) with an LRU policy. Dask generally assumes that the result of individual tasks fits comfortably in memory.
Can I hint the memory usage of each task to the scheduler?
No.
Upvotes: 2