Reputation: 29
What is the difference between none and uncompressed parquet file compression. Is there a significant memory advantage between these two compression techniques?
Upvotes: 0
Views: 971
Reputation: 91
In Python Pandas, at least, using compression=None
when exporting to parquet means no compression / uncompressed.
https://pandas.pydata.org/pandas-docs/version/1.1/reference/api/pandas.DataFrame.to_parquet.html
Upvotes: 0
Reputation: 5531
There is no such thing as NONE
Parquet file compression - https://github.com/apache/parquet-mr/blob/master/parquet-common/src/main/java/org/apache/parquet/hadoop/metadata/CompressionCodecName.java offers:
UNCOMPRESSED, SNAPPY, GZIP, LZO, BROTLI, LZ4, ZSTD
The class also shows:
public static CompressionCodecName fromConf(String name) {
if (name == null) {
return UNCOMPRESSED;
}
return valueOf(name.toUpperCase(Locale.ENGLISH));
}
So if a compression isn't specified then it defaults to UNCOMPRESSED
.
Upvotes: 2