Reputation: 126
I am looking to use an alternate way to compress the files for read/write performance, and one of the avenues I have explored is through the use of Snappy compression.
So far, it has been so good, and have been able to get it into HDFS and decompress it using the -text command to see the values. The real issue happens when I try to import the data into hive.
When I import the data into hive, I create a simple external table along with setting the parameters to read Snappy compressed file...
SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;
CREATE EXTERNAL TABLE IF NOT EXISTS test(...
..
)
LOCATION '/user/.../'
When I run SELECT COUNT(*) from test; I get the correct row value; however, if I run SELECT * FROM test limit 100; all I see are NULL values. Why is this happening? Any thoughts?
Upvotes: 0
Views: 1157
Reputation: 1319
In these scenario your mapreduce program generate by hive can't able to find snappy libraries so they are not able to decompress the data.For this try adding snappy.jar in hive auxpath which is available in lib directory of sqoop.Also can you see the logs and configuration of MapReduce program generated by hive for your query to check whether snappy.jar file is loaded in mapreduce.
Setting Hive auxpath require starting hive shell with following parameter: hive --auxpath
Hope these answer you question.
Upvotes: 0