Reputation: 2639
In jupyterlab, ?dask.compute
will show
Signature: dask.compute(*args, **kwargs)
Docstring:
Compute several dask collections at once.
Parameters
----------
args : object
Any number of objects. If it is a dask object, it's computed and the
result is returned. By default, python builtin collections are also
traversed to look for dask objects (for more information see the
``traverse`` keyword). Non-dask arguments are passed through unchanged.
traverse : bool, optional
By default dask traverses builtin python collections looking for dask
objects passed to ``compute``. For large collections this can be
expensive. If none of the arguments contain any dask objects, set
``traverse=False`` to avoid doing this traversal.
scheduler : string, optional
Which scheduler to use like "threads", "synchronous" or "processes".
If not provided, the default is to check the global settings first,
and then fall back to the collection defaults.
optimize_graph : bool, optional
If True [default], the optimizations for each collection are applied
before computation. Otherwise the graph is run as is. This can be
useful for debugging.
kwargs
Extra keywords to forward to the scheduler function.
It says, kwargs is "Extra keywords to forward to the scheduler function.". But how could I know what extra keywords can be used here?
Upvotes: 2
Views: 368
Reputation: 2639
I found a doc page scheduler-overview which is not in the TOC of dask doc. It mentions four get
function and said
dask.threaded.get: a scheduler backed by a thread pool
dask.multiprocessing.get: a scheduler backed by a process pool
dask.get: a synchronous scheduler, good for debugging
distributed.Client.get: a distributed scheduler for executing graphs on multiple machines. This lives in the external distributed project.
For more information on the individual options for each scheduler, see the docstrings for each scheduler get function.
So, in the jupyterlab, if we type ?dask.multiprocessing.get
, we got
Signature:
dask.multiprocessing.get(
dsk,
keys,
num_workers=None,
func_loads=None,
func_dumps=None,
optimize_graph=True,
pool=None,
chunksize=None,
**kwargs,
)
so we can know num_workers
, chunksize
, etc can be used in compute
Upvotes: 2
Reputation: 16561
As per comment by @furas, for an exhaustive list of arguments you will need to examine the source code. The relevant documentation is to look at the distributed
API, especially for client.submit
and client.compute
entries.
However, in practice, the ones that I tend to use are:
resources
, specifying specific resources needed for the task (e.g. resources={"foo": 1}
to make each task use 1 unit of some resource "foo")priority
, specifying task priority (e.g. priority=-10
to make this task less important relative to others)key
, this one has to be a unique name per task and I use it rarely, only to have a custom representation of tasks in the dashboard when debugging/monitoring long-running tasks.Upvotes: 2