Reputation: 777
I am doing a very simple data transformation with Dask_ML
and I am getting this error, I was wondering if anyone has encountered this. Looks like a system setting that can be modified?
df.head()
ReportDate_Time 2015-05-01 2015-06-01 2015-07-01 2015-08-01 2015-09-01 2015-10-01 2015-11-01 2015-12-01 2016-01-01 2016-02-01 ... 2017-04-01 2017-05-01 2017-06-01 2017-08-01 2017-09-01 2017-10-01 2017-11-01 2017-12-01 2018-01-01 2018-02-01
0 33.0 32.0 32.0 24.0 24.0 32.0 31.0 31.0 31.0 35.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1 670.0 978.0 1061.0 1048.0 525.0 936.0 804.0 1067.0 859.0 647.0 ... 407.0 457.0 517.0 388.0 345.0 428.0 688.0 1486.0 1090.0 0.0
2 132.0 130.0 127.0 137.0 92.0 96.0 112.0 124.0 126.0 112.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
3 206.0 259.0 207.0 181.0 74.0 142.0 125.0 172.0 138.0 194.0 ... 188.0 211.0 239.0 179.0 159.0 197.0 318.0 315.0 189.0 190.0
4 290.0 311.0 401.0 381.0 138.0 315.0 275.0 407.0 408.0 419.0 ... 76.0 56.0 159.0 16.0 0.0 0.0 213.0 123.0 4.0 3.0
5 rows × 33 columns
from dask_ml.preprocessing import StandardScaler
scaler = StandardScaler().fit_transform(df_pivot)
Gives Error:
CancelledError:
distributed.client - ERROR - Failed to reconnect to scheduler after 10.00 seconds, closing client
distributed.utils - ERROR -
Traceback (most recent call last):
File "/home/user/.local/lib/python3.6/site-packages/distributed/utils.py", line 662, in log_errors
yield
File "/home/user/.local/lib/python3.6/site-packages/distributed/client.py", line 1290, in _close
await gen.with_timeout(timedelta(seconds=2), list(coroutines))
concurrent.futures._base.CancelledError
distributed.utils - ERROR -
Traceback (most recent call last):
File "/home/user/.local/lib/python3.6/site-packages/distributed/utils.py", line 662, in log_errors
yield
File "/home/user/.local/lib/python3.6/site-packages/distributed/client.py", line 1019, in _reconnect
await self._close()
File "/home/user/.local/lib/python3.6/site-packages/distributed/client.py", line 1290, in _close
await gen.with_timeout(timedelta(seconds=2), list(coroutines))
concurrent.futures._base.CancelledError
Any ideas?
Upvotes: 0
Views: 2833
Reputation: 762
Code may have changed since @quasiben's answer. According to the client's constructor, the config name is different.
The timeout can be set done explicitly:
Client(..., timeout="50s")
or in the configs ("connect" instead of "tcp")
import dask
import distributed
dask.config.set({"distributed.comm.timeouts.connect": "50s"})
Upvotes: 2
Reputation: 1464
You can configure timeouts in dask.distributed with the following:
distributed:
comm:
timeouts:
tcp: 50s # time before calling an unresponsive connection dead
or in the code:
import dask
import distributed
dask.config.set({"distributed.comm.timeouts.tcp": "50s"})
Upvotes: 4