D. Müller
D. Müller

Reputation: 3426

HBase: All data stored in one region

I'm importing HFiles into HBase using the command:

hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles -Dcreate.table=no /user/myuser/map_data/hfiles my_table

When I just had a look into the HBase Master UI, I saw that all data seems to be stored on one region:

enter image description here

The HFiles were created by a Spark application, using this command:

JavaPairRDD<String, MyEntry> myPairRDD = ...
myPairRDD .repartitionAndSortWithinPartitions(new HashPartitioner(hbaseRegions));

Why is the data not splitted into all regions?

Upvotes: 1

Views: 985

Answers (1)

Ram Ghadiyaram
Ram Ghadiyaram

Reputation: 29155

Why is the data not splitted into all regions?

enter image description here From the above picture seems like your rowkeys are not salted properly before loading in to hbase. so at source table it self its loading in to one particular region.

So your rdd will carry the the number of source partitions which caused hotspotting

Look at Rowkey design from hbase docs

So I would suggest while table creation it self pre-split in to number of regions may be 0 to 10 and then append prefix between 0-10 to row key would ensure uniform distribution of data.

For ex :

create 'tableName', {NAME => 'colFam', VERSIONS => 2, COMPRESSION => 'SNAPPY'}, 
    {SPLITS => ['0','1','2','3','4','5','6','7']}

prefix can be any random id generated between range of pre-splits.

This kind of row key will avoid hot-spotting also if data increases. & Data will be spread across region server.

Also look at at my answer

Upvotes: 4

Related Questions