Reputation: 403
I have around TB's of data in my Hive warehouse, am trying to enable snappy compression for them. I know that we can enable hive compression using
hive> SET hive.exec.compress.output=true;
hive> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
while loading the data into hive, But how do i compress the data which is already loaded.
Upvotes: 1
Views: 5403
Reputation: 279
Hive ORCFile supports compressed storage. To convert existing data to ORCFile, create a new table with the same schema as the source table plus stored as orc, See below:-
CREATE TABLE A_ORC (
customerID int, name string, ..etc
) STORED AS ORC tblproperties (“orc.compress" = “SNAPPY”);
INSERT INTO A_ORC SELECT * FROM A;
Here A_ORC is the new table and A is source table
Here you can learn more about ORCFile.
Upvotes: 1