Reputation: 149
I have a problem with finding answer for question:
I want to pre-split HBase table for e.g. on 5 regions. Maximum file size in configuration I have set for 10GB. (Just example, of course). What if i will fill all my 5 regions for table? HBase will create 6th region for that?
I found opinion that it will be automatically split into 2 regions, but I need to be sure and some explanation.
Thank for all answers.
Upvotes: 1
Views: 2325
Reputation: 850
Let's first discuss about pre-splitting.
Its only recommended when we know the distribution of the keys, else pre-splitting might run into un-even data load if there is any skew in the data.
Its the general nature of Hbase for Automatic and configurable sharding of tables.
Quoting from the Cloudera Hbase site :-
Regardless of whether pre-splitting is used or not, once a region gets to a certain limit, it is automatically split into two regions.
You can configure the default split policy to be used by setting the configuration “hbase.regionserver.region.split.policy”, or by configuring the table descriptor. We can also implement our own custom split policy, and plug that in at table creation time, or by modifying an existing table:
HTableDescriptor tableDesc = new HTableDescriptor("example-table");
tableDesc.setValue(HTableDescriptor.SPLIT_POLICY, <SplitPolicy.class.getName()>);
//add columns etc
admin.createTable(tableDesc);
For more info : - https://hortonworks.com/blog/apache-hbase-region-splitting-and-merging/
Upvotes: 4