Akshat Mathur
Akshat Mathur

Reputation: 67

How to check if ZLIB compression is enabled in hive tables?

I see compression attribute set as no in desc.

How I created table:

create table temp (.....) stored as orc tblproperties("orc.compress"="ZLIB")

Upvotes: 1

Views: 2909

Answers (3)

Takreem
Takreem

Reputation: 1

The best way I found is by writing a python code to read the file and use the pyorc package

import pyorc

with open('path/to/orc/file', 'rb') as file:
    reader = pyorc.Reader(file)
    compression_type = reader.compression
    print(compression_type)

Upvotes: 0

Rahul
Rahul

Reputation: 2374

The answer to your question is describe formatted statement.

When your fire this command with the following syntax

describe formatted <your table name>

you will see some output on your screen and a portion of it will look like below.

# Detailed Table Information             
Database:               default                  
Owner:                  edureka_268377           
CreateTime:             Thu Feb 22 04:56:05 UTC 2018     
LastAccessTime:         UNKNOWN                  
Protect Mode:           None                     
Retention:              0                        
Location:               hdfs://nameservice1/user/hive/warehouse/tests3   
Table Type:             MANAGED_TABLE            
Table Parameters:                
        orc.compress            ZLIB                
        transient_lastDdlTime   1519275365          

# Storage Information            
SerDe Library:          org.apache.hadoop.hive.ql.io.orc.OrcSerde        
InputFormat:            org.apache.hadoop.hive.ql.io.orc.OrcInputFormat  
OutputFormat:           org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat         
Compressed:             No                       
Num Buckets:            -1                       
Bucket Columns:         []                       
Sort Columns:           []                       
Storage Desc Params:             
        serialization.format    1     

Pay attention to the Table Parameters section. It has a property called orc.compress. It says ZLIB. So ZLIB is your compression codec. If it is SNAPPY or something else, it will be mentioned there. If it is blank, the compression codec is ZLIB, the default one!

Hope that helps!

Upvotes: 0

leftjoin
leftjoin

Reputation: 38290

You can use orcfiledump utility:

hive --orcfiledump hdfs://table_location 

It will print orc file metadata, statistics, compression information.

Compression information looks like this:

Rows: 95
Compression: SNAPPY
Compression size: 262144

See manual here: ORC File Dump Utility

Also hive command describe formatted table_name prints Table Parameters and there is orc.compress parameter.

Upvotes: 1

Related Questions