Sai
Sai

Reputation: 1117

Is it possible to compress Parquet file which contain Json data in hive external table?

I want to know how to compress Parquet file which contain Json data in hive external table. How can it be done?

I have created external table like this:

create table parquet_table_name3(id BIGINT,created_at STRING,source STRING,favorited BOOLEAN) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' LOCATION '/user/cloudera/parquet2';

and I had set the compression properties

set parquet.compression=GZIP;

and compressed my input Parquet file by executing

GZIP <file name> ( i.e 000000_0.Parquet) 

after that i have load compresed GZIP file into hdfs location /user/cloudera/parquet2

next i have try to run the run the below query

select * from parquet_table_name3;

i am getting bellow result

NULL    NULL    NULL    NULL
NULL    NULL    NULL    NULL

Can you please let me know why i am getting null value instead of result, how to do parquet file compression(if it contain json data) in hive external table ? Can someone help me to compress in hive external table?

Upvotes: 0

Views: 2732

Answers (1)

Samson Scharfrichter
Samson Scharfrichter

Reputation: 9067

Duh! You can't compress an existing Parquet file "from outside". It's a columnar format with a hellishly complicated internal structure, just like ORC; the file "skeleton" requires fast random access (i.e. no compression), and each data chunk has to be compressed separately because they are accessed separately.

It's when you create a new Parquet file that you request the SerDe library to compress data inside the file, based on the parquet.compression Hive property.
At read time, the SerDe then checks the compression codec of each data file and decompresses accordingly.

A quick Google search returns a couple of must-reads such as this and that.

Upvotes: 3

Related Questions