Reputation: 1
We have HDInsight Hbase cluster installed and we observe that while there is major compactions going on, the Hbase becomes not reachable to the client applications.
Please suggest what is the best practices to handle this scenario.
Upvotes: 0
Views: 508
Reputation: 21
Regarding HDInsight HBase, I would like to share some ideas here.
1) Time based compaction is disabled by deafult, see hbase.hregion.majorcompaction=0
2) Regarding size based compaction, the default compaction policy is ExploringCompactionPolicy
while hbase.hstore.compaction.max.size
is set to 10GB, so no compactions will happen larger than 10GB.
hbase.hregion.max.filesize
is set to 3GB, thus once a region's HFiles have grown to execeed this value, the region will get split.
The reason for such settings is that the max blob HBase could create in Azure Storage is up to 12GB, thus if compacting more than 12GB data, compaction will finally fail. You can definately increase the max blob size (up to 200GB per Azure Storage documented, but that will increase read/write latency and compaction time as well).
More context here,
Although Azure blob storage has 200GB limit for a single blob, (4MB*50k blocks), but in order to get best performance, in hadoop core-site.xml
we limit fs.azure.read.request.size
and fs.azure.write.request.size
to 256kb, thus the max blob in HBase cluster will be 256KB*50k around 12GB. if you set to 4MB, it will be 200GB though. But 4MB will increase latency of each read/write, and you will allow HBase to compact up to 200GB data which will last for hours.
3) Major compaction is costly especially for cloud based HBase. Because the latency is higher than local disk/SSD. For read performance, you can set up bucket cache mounted on local VM SSD, which should have been turned on by default on the latest HDInsight HBase cluster.
There are definitely more tuning can be done like VM size, cluster size, Memstore size, etc.
Upvotes: 2
Reputation: 558
It depends of your use case.
By default, major compaction are lunched each 24 hours.
If you know when your cluster is not used you can disable major compaction and run at that time (typically the night). A script called by cron that launch major compaction with hbase shell can do the job.
Since HBase 0.98.11 and HBase 1.1.0 you can limit compaction throughput, more information on Limit compaction speed JIRA.
It is important to launch major compaction because it improves HBase disk access by merging StoreFile (removing deleted data on disk, sort data by rowkey, ...)
hbase-site.xml :
<!-- Disable major compaction -->
<property>
<name>hbase.hregion.majorcompaction</name>
<value>0</value>
</property>
Run major compaction manually :
# Launch major compaction on all regions of table t1
$ echo "major_compact 't1'" | hbase shell
# Launch major compaction on region r1
$ major_compact 'r1'
Upvotes: 0