Reputation: 1
Is this done VIA HDFS API - if this is the case the how the data locality is achieved [Assumption the Region server and the Datanode on same system ] i.e. the NameNode will allocate the Datanodes as per its statistics to store the data.
Upvotes: 0
Views: 2708
Reputation: 34184
Yes. HBase uses HFileSystem
, an encapsulation for the FileSystem object, to access data. See HFileSystem
for more.
And for the rest of your question you can visit this link. Actually, you must go through this link. Lars has explained it beautifully.
Upvotes: 0
Reputation: 383
Tariq is right about the use of HFileSystem to abstract away the interface for the client, but a much better explanation of how files are actually wrote to the HDFS DataNodes can be found on this link Hbase Architecture HBase Storage http://ofps.oreilly.com/static/titles/9781449396107/figs/hbase-files.png
In short for the data locality to be maintained the client contacts the Zookeeper cluster to find the location of the ROOT region (a hotname basically) for a particular row. Afterwards, it queries the hostname received to find the server that hosts the .META. table. It then proceeds to query the table to find out which server has the row it needs, client caches the location of the ROOT and .META. tables along with the location of rows it needs.
In order to write the HFile to HDFS, the client requests a PUT on the HTable, the HRegionServer passes it to the HRegion instance, which then stores it in the MemStore(if no write ahead flag is set). When the MemStore is full it gets flushed to the DataNodes
Upvotes: 1