Reputation: 16982
Is there a way to use pyarrow parquet dataset to read specific columns and if possible filter data instead of reading a whole file into dataframe?
Upvotes: 5
Views: 12546
Reputation: 63282
As of pyarrow==2.0.0
, this is possible at least with pyarrow.parquet.ParquetDataset
.
To read specific columns, its read
and read_pandas
methods have a columns
option. You can also do this with pandas.read_parquet
.
To read specific rows, its __init__
method has a filters
option.
Upvotes: 6