Mr.Teen
Mr.Teen

Reputation: 591

Is predicate pushdown available for compressed Parquet files?

In Spark 2.2, is predicate pushdown available for compressed Parquet files (e.g. GZIP, Snappy)?

Upvotes: 1

Views: 300

Answers (1)

Uwe L. Korn
Uwe L. Korn

Reputation: 8796

Yes, predicate pushdown works on all Parquet files. The important part here is that compression in the context of Parquet means that the data is compressed but the metadata parts of the file are not compressed but always stored in plain. This allows then any processor working on top of Parquet files to read the statistics of each chunk in a file and then only load the relevant parts of it.

Upvotes: 3

Related Questions