sp_user123
sp_user123

Reputation: 502

how hbase random write works

I am new to Hbase.Hbase is good for random updates (put or delete) to a table but am unable to understand how hbase performs that.As hbase uses HDFS for its storage and its not possible to update anything in HDFS. Hbase uses memstore to update the records and writes any edit to the memstore first.So MemStore contains arbitrary number of updated rows in a sorted key order.when it dumps the data to disk to a hfile ,is this hfile is globally sorted with other hfiles.

After dumping all the hfile is the hfile is replicated in HDFS.Same question for the WAL edit log.WAL log file is also replicated in HDFS or not.For every update are we replicating the update to HDFS.

Upvotes: 2

Views: 2145

Answers (1)

th30z
th30z

Reputation: 1942

these blog posts may help you http://blog.cloudera.com/blog/2012/06/hbase-io-hfile-input-output/ http://blog.cloudera.com/blog/2012/06/hbase-write-path/

but basically, you send a put(key, value) that is written to the wal (for recovery) and to the memstore. When the memstore reaches a threshold the memstore is written in a sorted order to disk (hfile). after a while you have multiple hfiles on disk. since you know that each file has sorted content, you can perform a (sorted) merge to query your data.

the WAL is just used in case of crash, if your data is in the memstore and the machine crashes the only copy you have is in the WAL. once your data is flushed the WAL containing the memstore data can be removed.

Upvotes: 4

Related Questions