Y.S
Y.S

Reputation: 1862

How to use python to create ORC file compressed with ZLIB compression level 9?

I want to create an ORC file compressed with ZLIB compression level 9. Thing is, when using pyarrow.orc, I can only choose between "Speed" and "Compression" mode, and can't control the compression level

E.g.

orc.write_table(table, '{0}_zlib.orc'.format(file_without_ext),
                compression='ZLIB', compression_strategy='COMPRESSION')

Ideally I'm looking for a non existing compression_level parameter, any help would be appreciated.

Upvotes: 1

Views: 94

Answers (1)

Olivier
Olivier

Reputation: 18072

The Apache ORC library (which is used internally by other libraries for ORC support) doesn't allow to set the compression level freely (neither the C++ nor the Java implementation).

The C++ library supports only CompressionStrategy_SPEED and CompressionStrategy_COMPRESSION (source):

enum CompressionStrategy { CompressionStrategy_SPEED = 0, CompressionStrategy_COMPRESSION };

The Java library offers an additional FASTEST option (source):

enum SpeedModifier {
    /* speed/compression tradeoffs */
    FASTEST,
    FAST,
    DEFAULT
  }

There is an open request in the project about this: Support maximum compression ratio in setSpeed. It was created a year ago but the feature has not been implemented so far.

So, unless you patch the library yourself, there is no way to set a high compression level.

Upvotes: 2

Related Questions