Santosh Kumar
Santosh Kumar

Reputation: 791

dask dataframe set_index throws error

I have a dask dataframe created from parquet file on HDFS. When creating setting index using api: set_index, it fails with below error.

File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/dask/dataframe/shuffle.py", line 64, in set_index divisions, sizes, mins, maxes = base.compute(divisions, sizes, mins, maxes) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/dask/base.py", line 206, in compute results = get(dsk, keys, **kwargs) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py", line 1949, in get results = self.gather(packed, asynchronous=asynchronous) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py", line 1391, in gather asynchronous=asynchronous) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py", line 561, in sync return sync(self.loop, func, *args, **kwargs) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/utils.py", line 241, in sync six.reraise(*error[0]) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/six.py", line 693, in reraise raise value File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/utils.py", line 229, in f result[0] = yield make_coro() File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/tornado/gen.py", line 1055, in run value = future.result() File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/tornado/concurrent.py", line 238, in result raise_exc_info(self._exc_info) File "", line 4, in raise_exc_info File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/tornado/gen.py", line 1063, in run yielded = self.gen.throw(*exc_info) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/distributed/client.py", line 1269, in _gather traceback) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/six.py", line 692, in reraise raise value.with_traceback(tb) File "/ebs/d1/agent/conda/envs/py361/lib/python3.6/site-packages/dask/dataframe/io/parquet.py", line 144, in _read_parquet_row_group open=open, assign=views, scheme=scheme) TypeError: read_row_group_file() got an unexpected keyword argument 'scheme'

Can some one point me to the reason of this error and how to fix it.

Upvotes: 3

Views: 928

Answers (1)

george
george

Reputation: 11

Solution

Upgrade fastparquet to version 0.1.3.

Details

Dask 0.15.4, used for your example, includes this commit, which adds the argument scheme to read_row_group_file(). This throws an error for fastparquet versions before 0.1.3.

Upvotes: 1

Related Questions