Hadoop MultipleOutputFormats to HFileOutputFormat and TextOutputFormat

Question

I am running an ETL job with Hadoop where I need to output the valid, transformed data to HBase, and an external index for that data into MySQL. My initial thought is that I could use MultipleOutputFormats to export the the transformed data with HFileOutputFormat (key is Text and value is ProtobufWritable), and an index to TextOutputFormat (key is Text and value is Text).

The number of inputs records for an average-sized job (I'll need the ability to run many at once) is about 700 million.

I'm wondering if A) this seems to be a reasonable approach in terms of efficiency and complexity, and B) how to accomplish this with the CDH3 distribution's API, if possible.

Hadoop MultipleOutputFormats to HFileOutputFormat and TextOutputFormat

Answers (1)

Related Questions