Cedric Reichenbach
Cedric Reichenbach

Reputation: 9319

HBase standalone performance vs. running on an HDFS cluster

My Application is connected to an HBase and does a lot of communication (hundreds or thousands of reads/writes per second). This strongly affects performance, probably due to I/O operations HBase does on every request.

Time cost with and without HBase! Doo.dle are calls to my code - the difference between blue and red is time consumed by HBase.

Currently, I've only tested in standalone mode, where HBase stores data using the local file system. I was wondering, whether using one in distributed mode with an actual HDFS could significantly improve performance, or just yield the same results. I'm trying to get a clue before losing too much time into getting a cluster up and running.

A second question I've asked myself is whether a standalone HBase could be configured to just persist data to memory (RAM) instead of writing it to the file system for performance measures.

Upvotes: 5

Views: 1017

Answers (1)

Yosser Goupil
Yosser Goupil

Reputation: 799

In the standalone mode,HBase does not use HDFS and it runs all HBase daemons and a local ZooKeeper all up in the same JVM

In a Pseudo-distributed mode, Hbase can run against the local filesystem or it can run against an instance of the Hadoop Distributed File System. So there is no difference between standalone and pseudo-distributed considering the performance.

The Fully-distributed mode requires the use of HDFS which means that the tasks will run over jobs and that's take time according to my experience.

So using Hbase in fully-distributed mode with an actual HDFS could significantly improve performance.

Upvotes: 1

Related Questions