Reputation: 7362
I have a mapreduce program that first scans an HBase table.
I want some reducer output to go to hdfs and some reducer output to be written to an hbase table. Can a reducer be configured to output to two different locations/formats like this?
Upvotes: 2
Views: 1309
Reputation: 813
i think multiple output can do the job.. chk tis out http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html
Upvotes: 1
Reputation: 430
If you don't want to write too much code, just open a Table in your mapper's or reducer's setup method and do a put statement into your hbase table. On the other hand, write your job such that the output file is an hdfs file. This way you get to both write to hbase and hdfs.
To be more elaborate, when you do a context.write(), you would write to the hdfs file, and on the other hand, the table.put can happen when you do a put.
Also, don't forget to close the table and anything else in your cleanup() method. The only backdrop is, if there are let's say 1000 mappers your table connection would be opened a 1000 times, but at any given point, only the max number of your mappers really run, so that would probably be 50, depending on your setup. Works for me at least!
Upvotes: 1
Reputation: 1480
A reducer can be configured to use multiple files to output using the MulitpleOutputs
class. The documentation at the top of that class provides a clear example for writing to multiple files. However, since there is no built in Outputformat for writing to HBase you might consider writing the 2nd stream to specific place on HDFS and then using another job to insert it into HBase.
Upvotes: 3