Reputation: 1862
I want to create an ORC file compressed with ZLIB compression level 9. Thing is, when using pyarrow.orc, I can only choose between "Speed" and "Compression" mode, and can't control the compression level
E.g.
orc.write_table(table, '{0}_zlib.orc'.format(file_without_ext),
compression='ZLIB', compression_strategy='COMPRESSION')
Ideally I'm looking for a non existing compression_level
parameter, any help would be appreciated.
Upvotes: 1
Views: 94
Reputation: 18072
The Apache ORC library (which is used internally by other libraries for ORC support) doesn't allow to set the compression level freely (neither the C++ nor the Java implementation).
The C++ library supports only CompressionStrategy_SPEED
and CompressionStrategy_COMPRESSION
(source):
enum CompressionStrategy { CompressionStrategy_SPEED = 0, CompressionStrategy_COMPRESSION };
The Java library offers an additional FASTEST
option (source):
enum SpeedModifier {
/* speed/compression tradeoffs */
FASTEST,
FAST,
DEFAULT
}
There is an open request in the project about this: Support maximum compression ratio in setSpeed. It was created a year ago but the feature has not been implemented so far.
So, unless you patch the library yourself, there is no way to set a high compression level.
Upvotes: 2