Reputation: 294
I created a dataframe from pandas and used to_parquet(...) to write to s3 directly.
arguments are:
df.to_parquet('s3://bucket/fn.parquet', compression='gzip', engine='fastparquet', partition_cols=['col1'])
when I use pandas's pandas.read_parquet(url)
, the dataframe is loaded fine.
But when I use modin.pandas.read_parquet(url)
, I get following error:
File "/home/mguo/anaconda3/envs/testenv/lib/python3.7/site-packages/s3fs/core.py", line 1779, in __init__
self.req_kw["IfMatch"] = self.details["ETag"]
KeyError: 'ETag'
Below are my version:
python==3.7.3
pandas==1.2.4
modin==0.10.0
s3fs==2021.6.0
Upvotes: 4
Views: 2736
Reputation: 176
This issue on the Modin GitHub tracked support for reading partitioned files with read_parquet
in Modin, as you are trying to do here. This pull request on the Modin GitHub added that feature and resolved the issue. You should be able to read partitioned parquet files without the ETag KeyError if you upgrade to the latest version of Modin (0.12.0).
Upvotes: 1