Reputation: 29
I am trying to do my data retention for the parquet files in Hdfs.What compression technique would be good for these kind of file types which were already compressed with snappy?
Upvotes: 2
Views: 5318
Reputation: 8796
Newer versions of Parquet support Zstandard or Brotli compression. Depending on the compression level set, this should improve the compression ratio and speed over snappy. This though requires that you check that all tools you are using already support Zstandard.
An import aspect of Parquet is that the compression is part of the format and the data chunks are compressed individually. This allows very efficient access to a compressed file without the need to fully decompress it. Applying compression on top of an existing Parquet file will remove this ability and severely hurt performance.
Upvotes: 6
Reputation: 5526
Snappy is the best choice to keep the data compressed, if you want to further add another compression encoding it won't help you save space as it is already compressed. It'll only increase the overhead of decompressing if you want to read in future. Better go with the snappy compression itself.
Upvotes: 1