Reputation: 591
In Spark 2.2, is predicate pushdown available for compressed Parquet files (e.g. GZIP, Snappy)?
Upvotes: 1
Views: 300
Reputation: 8796
Yes, predicate pushdown works on all Parquet files. The important part here is that compression in the context of Parquet means that the data is compressed but the metadata parts of the file are not compressed but always stored in plain. This allows then any processor working on top of Parquet files to read the statistics of each chunk in a file and then only load the relevant parts of it.
Upvotes: 3