Reputation: 407
I'm going through some HBase Architecture notes here: https://mapr.com/blog/in-depth-look-hbase-architecture/ and saw it said
There is one MemStore per Column Family; when one is full, they all flush. It also saves the last written sequence number so the system knows what was persisted so far.
My question is two-fold.
1
and 2
. If 1
is flushed than for future Gets we can still check 2
before checking disk (HFiles) for 2
's Column Family, right?1
with row keys a
, b
, and d
and I flush them. What's the "last written sequence number"? Upvotes: 1
Views: 620
Reputation: 699
Let's start from how write operations handled by HBase. When you performing a write to HBase, it will do following(simplified view):
Each write operation is marked by 'sequence number'. This is some sort of MVCC transaction ID. Quote from HBase docs:
A region-specific unique monotonically increasing sequence ID given to each Cell. It always exists for cells in the memstore but is not retained forever.
Sequence number is written into WAL as part of write operation along with new KV. After successful write into WAL, HBase applies changes into MemStore
and respond to client about successful write. From this point, new KV persisted and will not be lost if RegionServer
dies.
Because each write is increase size of WAL, HBase should truncate it to reduce disk usage. To accomplish this job, WAL must ensure that changes described by it's entries are durably persisted to disk(to not lose updates if server will crash). For that purpose, WAL tracks aforementioned "last written sequence number"(LWSN) of each region which belongs to RegionServer.
These LWSN represent most recent writes which was flushed to disk. All write operations with greater seqnum
reside only in MemStore, not on disk yet. WAL uses value of region's LWSN to find entries which 'seqnum' is less that regions's LWSN. Such entries can be removed from WAL because they were flushed to disk and will not be losed during server crash.
Let's see example of how LWSN is tracked by HBase. Suppose you have a 2 column families 'a' and 'b'. You perform 200 write operations: first 100 will be written to 'a' and other 100 to 'b'. 'seqnum''s of operations related to col.family 'a' is in range [1..100] and for 'b' will be [101..201]. Suppose writes to 'b' is more heavy sized and cause a flush of MemStore of 'b', but not an 'a'. During this flush, HBase should update LWSN of region. It's not correct to update it to value of 201, because writes with 'seqnum's [1..100] are not persisted(and must not be truncated from WAL).
That's why HBase flushes MemStores of all column families at once: if it flushes only full MemStore, it can't update LWSN of region and will delay WAL truncation(which can cause long server repair in case of crash).
Upvotes: 2