Benjamin Du
Benjamin Du

Reputation: 1881

Control the compression level when writing Parquet files using Polars in Rust

I found that by default polars' output Parquet files are around 35% larger than Parquet files output by Spark (on the same data). Spark uses snappy for compression by default and it doesn't help if I switch ParquetCompression to snappy in polars. I wonder is this due to polars use a more conservative compression ratio? Is there any way to control the compression level of Parquet files in polars? I checked the doc of polars, it seems that only Zstd accept a ZstdLevel (not even sure whether it is compression level).

Below is my code to write a DataFrame to a Parquet file using the snappy compression.

let f = File::create("j.parquet").expect("Unable to create the file j.parquet!");
let mut bfw = BufWriter::new(f);
let pw = ParquetWriter::new(bfw).with_compression(ParquetCompression::Snappy); 
pw.finish(&mut df);

Upvotes: 0

Views: 1479

Answers (1)

ritchie46
ritchie46

Reputation: 14670

This is not (yet) possible in rust polars. It will likely be in next release of arrow2 and then we can implement it in polars as well.

If you want that functionality in python polars you can leverage pyarrow for this purpose. polars has zero copy interop with pyarrow.

Upvotes: 1

Related Questions