Tucker
Tucker

Reputation: 7362

HBase Mapreduce output to hdfs & HBASe

I have a mapreduce program that first scans an HBase table.

I want some reducer output to go to hdfs and some reducer output to be written to an hbase table. Can a reducer be configured to output to two different locations/formats like this?

Upvotes: 2

Views: 1309

Answers (3)

najeeb
najeeb

Reputation: 813

i think multiple output can do the job.. chk tis out http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html

Upvotes: 1

Vinayak Ponangi
Vinayak Ponangi

Reputation: 430

If you don't want to write too much code, just open a Table in your mapper's or reducer's setup method and do a put statement into your hbase table. On the other hand, write your job such that the output file is an hdfs file. This way you get to both write to hbase and hdfs.

To be more elaborate, when you do a context.write(), you would write to the hdfs file, and on the other hand, the table.put can happen when you do a put.

Also, don't forget to close the table and anything else in your cleanup() method. The only backdrop is, if there are let's say 1000 mappers your table connection would be opened a 1000 times, but at any given point, only the max number of your mappers really run, so that would probably be 50, depending on your setup. Works for me at least!

Upvotes: 1

coltfred
coltfred

Reputation: 1480

A reducer can be configured to use multiple files to output using the MulitpleOutputsclass. The documentation at the top of that class provides a clear example for writing to multiple files. However, since there is no built in Outputformat for writing to HBase you might consider writing the 2nd stream to specific place on HDFS and then using another job to insert it into HBase.

Upvotes: 3

Related Questions