Reputation: 10395
I am using HBase to store a lot of sensor data.
I have tried to use a txt file to store my sensor data, for a 20MB file, if I compress it, it will reduce to 1MB on disk.
My question is: Does HBase itself do compression automatically when storing the data to disks?
Thanks
Upvotes: 4
Views: 5303
Reputation: 311
You can also alter your table to add compression support later. Then your data will be compressed for real at the next compaction (as ali said, because a new HFile will be written to disk). As far as I understand, compression algorithm is used at the block-level, not at the whole HFile. That mean that when reading data, it won't have to uncompress a several-GBs HFile but only a few KBs data block.
Upvotes: 1
Reputation: 20242
you can use lzo, gzip or snappy for hbase compression. You will need to set lzo/snappy yourself if you wish to use them for hbase compression (gzip is included).
normally - lzo is faster than gzip compression though gzip compression ratio normally be better. Snappy is robust with compression but compression ratios are normally worse.
When creating a table - you can specify compression/compression library - hfiles are compressed when written to disk if compression is used (and need to be decompressed when reading).
hope it helps
Upvotes: 2