till dask 2.2.0 read_parquet filters parameter doesn't seem to work anymore with pyarrow engine

Question

when i upgraded dask from 2.1.0 to 2.2.0 (or 2.3.0), the following code changed its behaviour and stopped filtering parquet files as it did before. This is only appening with pyarrow engine (fastparquet engine is still filtering well).

I tried pyarrow 0.13.1, 0.14.0 and 0.14.1 without success on Dask 2.2.0 and 2.3.0.

My previous working setting is : Dask 2.1.0 with Pyarrow 0.14.1

This code was working for pyarrow engine

import dask.dataframe as dd
dd.read_parquet(directory, engine='pyarrow', filters=[(('DatePart', '>=', '2018-01-14'))])

To be noted that, the equivalent code for fastparquet engine has to remove one list level -> this is still working with fastparquet

import dask.dataframe as dd
dd.read_parquet(directory, engine='fastparquet', filters=[('DatePart', '>=', '2018-01-14')])

My parquet storage is partitionned by 'DatePart' with existing _metadata files.

Now resulting dataframe is not filtered anymore with the pyarrow engine. With no error messages.

till dask 2.2.0 read_parquet filters parameter doesn't seem to work anymore with pyarrow engine

Answers (1)

Related Questions

till dask 2.2.0 read_parquet filters parameter doesn&#39;t seem to work anymore with pyarrow engine

Answers (1)

Related Questions

till dask 2.2.0 read_parquet filters parameter doesn't seem to work anymore with pyarrow engine