Reputation: 171
I would expect in the following code, the first computation to take 3+sec and the second one to be much faster. What should I do to get dask to avoid re-doing a computation to the client? (I had previously searched for the answer to this question, regarding pure=True and have not found anything)
from dask import delayed, compute
from dask.distributed import Client
@delayed(pure=True)
def foo(a):
time.sleep(3)
return 1
foo_res = foo(1)
client = Client()
import time
t1 = time.time()
results = compute(foo_res, get=client.get)
t2 = time.time()
print("Time : {}".format(t2-t1))
t1 = time.time()
results = compute(foo_res, get=client.get)
t2 = time.time()
print("Time : {}".format(t2-t1))
output:
Time : 3.01729154586792
Time : 3.0170397758483887
Upvotes: 2
Views: 178
Reputation: 57319
You need to use the persist
method on the Client
foo_res = client.persist(foo_res)
This will start computation in the background and keep the results in memory for as long as some reference to foo_res
is in your Python session
Relevant doc page is here: http://distributed.readthedocs.io/en/latest/manage-computation.html
Upvotes: 2