Jing
Jing

Reputation: 945

How does bulkload in databases such as hbase/cassandra/KV store work?

I understand from https://svn.apache.org/repos/asf/hbase/hbase.apache.org/trunk/0.94/book/arch.bulk.load.html that we can process data into HBase data files and ask hbase to bulk load the data files into it for serving purposes. My question is more on how that works under the hood. The question came up when I was trying to learn about how hbase (or other online k/v store) can perform bulk load while handling the normal read/write queries at the same time without having downtime or significant performance hit.

I imagine it'd be something like:

  1. have background processes to load the data onto disk
  2. once the new data is ready, lock some in-memory data structure that tells hbase where to read the data on disk
  3. update this in-memory data structure to point to the new files on disk
  4. unlock
  5. now all in-coming read/write requests will direct to the new data

Is this roughly correct ? If not, what would be the underlying process for bulk load?

Upvotes: 0

Views: 29

Answers (0)

Related Questions