Reputation: 135
PCIE bus bandwidth latencies force constraints on how and when applications should copy data to and from GPUs.
When working with cuDF directly, I can efficiently move a single large chunk of data into a single DataFrame.
When using dask_cudf to partition my DataFrames, does Dask copy partitions into GPU memory one at a time? In batches? If so, is there significant overhead from multiple copy operations instead of a single larger copy?
Upvotes: 2
Views: 297
Reputation: 57261
This probably depends on the scheduler you're using. As of 2019-02-19 dask-cudf uses the single-threaded scheduler by default (cudf segfaulted for a while there if used in multiple threads), so any transfers would be sequential if you're not using some dask.distributed cluster. If you're using a dask.distributed cluster, then presumably this would happen across each of your GPUs concurrently.
It's worth noting that dask.dataframe + cudf doesn't do anything special on top of what cudf would do. It's as though you called many cudf calls in a for loop, or in one for-loop per GPU, depending on the scheduler choice above.
Disclaimer: cudf and dask-cudf are in heavy flux. Future readers probably should check with current documentation before trusting this answer.
Upvotes: 1