Reputation: 111
Recently I use hadoop bulk load to put data into hbase Firstly, I call HDFS API to write data into file in hadoop hdfs, totally 7000,000 lines data, the size is 503MB. Secondly, I use org.apache.hadoop.hbase.mapreduce.ImportTsv and org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles to put data into hbase.
The most import things that I did is using bulkload tool to put data into hbase,after finished bulkload, I found that the hbase table is 1.96GB. The hdfs replication is 1. I do not know why.
Upvotes: 3
Views: 3503
Reputation: 39943
There is a bit of overhead in storing the data since you have to store the names of the column qualifiers and such, but not 4x overhead. I have a few ideas, but definitely wouldn't mind hearing more details on the nature of the data and perhaps the stats on the table.
hadoop fs -dus /path/to/hbase/table/data
and see what that returns.colfam1:abc
is pretty small and won't take up much space, but colfam1:abcdefghijklmnopqrstuvwxyz
is going to take up quite a bit of space in the grand scheme of things!Upvotes: 3