Reputation: 1117
I want to know how to compress Parquet file which contain Json data in hive external table. How can it be done?
I have created external table like this:
create table parquet_table_name3(id BIGINT,created_at STRING,source STRING,favorited BOOLEAN) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' LOCATION '/user/cloudera/parquet2';
and I had set the compression properties
set parquet.compression=GZIP;
and compressed my input Parquet file by executing
GZIP <file name> ( i.e 000000_0.Parquet)
after that i have load compresed GZIP file into hdfs location /user/cloudera/parquet2
next i have try to run the run the below query
select * from parquet_table_name3;
i am getting bellow result
NULL NULL NULL NULL
NULL NULL NULL NULL
Can you please let me know why i am getting null value instead of result, how to do parquet file compression(if it contain json data) in hive external table ? Can someone help me to compress in hive external table?
Upvotes: 0
Views: 2732
Reputation: 9067
Duh! You can't compress an existing Parquet file "from outside". It's a columnar format with a hellishly complicated internal structure, just like ORC; the file "skeleton" requires fast random access (i.e. no compression), and each data chunk has to be compressed separately because they are accessed separately.
It's when you create a new Parquet file that you request the SerDe library to compress data inside the file, based on the parquet.compression
Hive property.
At read time, the SerDe then checks the compression codec of each data file and decompresses accordingly.
A quick Google search returns a couple of must-reads such as this and that.
Upvotes: 3