PseudoAccount
PseudoAccount

Reputation: 29

What is the difference between none and uncompressed parquet file compression

What is the difference between none and uncompressed parquet file compression. Is there a significant memory advantage between these two compression techniques?

Upvotes: 0

Views: 971

Answers (2)

mh0w
mh0w

Reputation: 91

In Python Pandas, at least, using compression=None when exporting to parquet means no compression / uncompressed.

https://pandas.pydata.org/pandas-docs/version/1.1/reference/api/pandas.DataFrame.to_parquet.html

Upvotes: 0

Ben Watson
Ben Watson

Reputation: 5531

There is no such thing as NONE Parquet file compression - https://github.com/apache/parquet-mr/blob/master/parquet-common/src/main/java/org/apache/parquet/hadoop/metadata/CompressionCodecName.java offers:

UNCOMPRESSED, SNAPPY, GZIP, LZO, BROTLI, LZ4, ZSTD

The class also shows:

  public static CompressionCodecName fromConf(String name) {
     if (name == null) {
       return UNCOMPRESSED;
     }
     return valueOf(name.toUpperCase(Locale.ENGLISH));
  }

So if a compression isn't specified then it defaults to UNCOMPRESSED.

Upvotes: 2

Related Questions