Reputation: 972
I have 3 ubuntu machine(CPU). my dask scheduler and client both are present on the same machine, whereas the two dask workers are running on other two machines. when I launch first task, it gets scheduled on first worker, but then upon launching second worker, while the first one still executing, it does not get scheduled on second worker. here is the sample client code that I tried.
### client.py
from dask.distributed import Client
import time, sys, os, random
def my_task(arg):
print("doing something in my_task")
time.sleep(2)
print("inside my task..", arg)
print("again doing something in my_task")
time.sleep(2)
print("return some random value")
value = random.randint(1,100)
print("value::", value)
return value
client = Client("172.25.49.226:8786")
print("client::", client)
future = client.submit(my_task, "hi")
print("future result::", future.result())
print("closing the client..")
client.close()
I am running "python client.py" two times almost at the same time from two different terminal/machines. both the client seems to be executing, but it results in exactly the same output which it should not because the return type of the my_task() is a random value. I tested this on ubuntu machines.
However a month back, I was able to run same tasks in parallel on CentOs machines. And now if check back and ran same two tasks from those CentOs machines, the problem persist. This is strange. it did not run in parallel. Not able to figure out this behavior by dask. Am I missing any OS level settings or something else.?
Run the below almost at the same time,
python client.py # from one machine/terminal
python client.py # from another machine/terminal
these two tasks should run in parallel, each task should run on different worker(we have two free workers available), but this is not happening. I can't see any log on the second worker console nor on the scheduler, while the first task continues to execute. At the end I noticed both the tasks finishes exactly at the same time with exactly same output.
However the above client code works well in "parallel" in windows OS, each task running through multiple terminals. but I would like to run it on Ubuntu machines.
Upvotes: 0
Views: 708
Reputation: 57319
By default if you call the same function on the same inputs Dask will assume that this will produce the same value, and only compute it once. You can override this behavior with the pure=False
keyword
future = client.submit(func, *args, pure=False)
Upvotes: 4