HBase standalone performance vs. running on an HDFS cluster

Question

My Application is connected to an HBase and does a lot of communication (hundreds or thousands of reads/writes per second). This strongly affects performance, probably due to I/O operations HBase does on every request.

Time cost with and without HBase! Doo.dle are calls to my code - the difference between blue and red is time consumed by HBase.

Currently, I've only tested in standalone mode, where HBase stores data using the local file system. I was wondering, whether using one in distributed mode with an actual HDFS could significantly improve performance, or just yield the same results. I'm trying to get a clue before losing too much time into getting a cluster up and running.

A second question I've asked myself is whether a standalone HBase could be configured to just persist data to memory (RAM) instead of writing it to the file system for performance measures.

HBase standalone performance vs. running on an HDFS cluster

Answers (1)

Related Questions