Reputation: 1

How to handle HDInsight Hbase Major compactions? Hbase is not reachable when the major compaction take place

We have HDInsight Hbase cluster installed and we observe that while there is major compactions going on, the Hbase becomes not reachable to the client applications.

Please suggest what is the best practices to handle this scenario.

Upvotes: 0

Answers (2)

onpduo

Reputation: 21

Regarding HDInsight HBase, I would like to share some ideas here.

1) Time based compaction is disabled by deafult, see hbase.hregion.majorcompaction=0

2) Regarding size based compaction, the default compaction policy is ExploringCompactionPolicy while hbase.hstore.compaction.max.size is set to 10GB, so no compactions will happen larger than 10GB.

hbase.hregion.max.filesize is set to 3GB, thus once a region's HFiles have grown to execeed this value, the region will get split. The reason for such settings is that the max blob HBase could create in Azure Storage is up to 12GB, thus if compacting more than 12GB data, compaction will finally fail. You can definately increase the max blob size (up to 200GB per Azure Storage documented, but that will increase read/write latency and compaction time as well).

More context here,

Although Azure blob storage has 200GB limit for a single blob, (4MB*50k blocks), but in order to get best performance, in hadoop core-site.xml we limit fs.azure.read.request.size and fs.azure.write.request.size to 256kb, thus the max blob in HBase cluster will be 256KB*50k around 12GB. if you set to 4MB, it will be 200GB though. But 4MB will increase latency of each read/write, and you will allow HBase to compact up to 200GB data which will last for hours.

3) Major compaction is costly especially for cloud based HBase. Because the latency is higher than local disk/SSD. For read performance, you can set up bucket cache mounted on local VM SSD, which should have been turned on by default on the latest HDInsight HBase cluster.

There are definitely more tuning can be done like VM size, cluster size, Memstore size, etc.

Upvotes: 2

Azwaw

Reputation: 558

It depends of your use case.

By default, major compaction are lunched each 24 hours.

If you know when your cluster is not used you can disable major compaction and run at that time (typically the night). A script called by cron that launch major compaction with hbase shell can do the job.

Since HBase 0.98.11 and HBase 1.1.0 you can limit compaction throughput, more information on Limit compaction speed JIRA.

It is important to launch major compaction because it improves HBase disk access by merging StoreFile (removing deleted data on disk, sort data by rowkey, ...)

hbase-site.xml :

<!-- Disable major compaction -->
<property> 
  <name>hbase.hregion.majorcompaction</name> 
  <value>0</value> 
</property>

Run major compaction manually :

# Launch major compaction on all regions of table t1
$ echo "major_compact 't1'" | hbase shell
# Launch major compaction on region r1 
$ major_compact 'r1'

Upvotes: 0

How to handle HDInsight Hbase Major compactions? Hbase is not reachable when the major compaction take place

Answers (2)

Related Questions