HBase Bulk Loading into multiple tables

Question

We are using HBase bulk loading techniques as explained in: http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/ (That is: Creating HFiles directly using HFileOutputFormat)

We have to go with this option to pre-populate HBase cluster with all the data we already have in legacy system(s).

As HBase does not support secondary tables (or indexes), we maintain secondary tables (or indexes) at application level.

Now the question is on how do we use bulk load technique to create HFiles of different tables (main table and secondary tables/indexes). Is there any multiple-HFileOutputFormat (like HFileMultiOutputFormat)?

I understand that we could create multiple MR Jobs and run each job separately. The cost comes from the 'reading' of so much of data (more than few TB). I wanted to find a way where I can read-once and write-multiple-times. Chaining MR Jobs does not help as all Map tasks need same data and chaining restricts the 2nd map task to get the output of 1st map task.

Similar questions have been asked here, here. But they are unanswered hence trying out again.

HBase Bulk Loading into multiple tables

Answers (1)

Related Questions