Reputation: 494
I've been doing some research on HBase and I'm currently finding challenges in understanding how HBase read path works. I have a basic understanding of how it works. But, I don't have clear understanding of how it reads multiple HFiles checking bloom filters. Whats the purpose of metablocks, how hbase uses it for reading the data. Whats the purpose of indexes in hfiles, and how its used ?
Hence needed your help in understanding this concept.
Your time is much appreciated. Thanks
Upvotes: 3
Views: 3168
Reputation: 34184
If there are more than one HFile at the time of read, HBase will check whether the row in question is there or not. If it is there HBase will read that row from all the HFiles(and also from memstore), so that client always gets the latest data. I'm sorry didn't quite get block filters
thing. Could you please point me to the source where you have read about this? That'll help me in providing you the complete answer.(Do you mean Bloom Filter?)
Purpose of metablock is to keep large amount of data. Metablocks are used by HFile to store a BloomFilter and a string key is associated with each metablock. Metablocks are kept in memory until HFile.close() is called.
An Index is written for metablocks to make reads faster. These indices contains n records (where n is the number of blocks) with block information (block offset, size and first key).
And at the end a Fixed File Trailer is written to the HFile. It contains offsets and counts for all the HFile Indices, HFile Version, Compression Codec etc. Now when read starts first of all HFile.loadFileInfo()
gets called and File Trailers, which were written earlier are loaded into the memory along with all the indices. It allows to query keys efficiently. Then with the help of HFileScanner
client seeks to a specified key, and iterate over it to read the data.
I would like to to point you to the links which had helped in understanding these things. Hopefully you'll find them useful.
Link 1: Apache HBase I/O – HFile (Cloudera)
Link 2: HBase I/O: HFile (th30z)
HTH
Upvotes: 5