Reputation: 11

Any idea on how to write an Hadoop InputFormat / OutputFormat for Hbase

Is anyone have some experience of writing a Hadoop InputFormat/OutputFormat that get their date from Hbase ?

I'd like something more specific than the HbaseTableInputFormat because my idea is to return my business objects directly to the mapred program. Which means being able to build an object that can spread among several rows.

Thanks For you help Ech

Upvotes: 1

Answers (2)

QuinnG

Reputation: 6424

You might be able to extend RecordReader and/or FileInputFormat and implement what you need to do inside those. Maybe extend HbaseTableInputFormat and override the functions you need different behavior in. (Haven't worked with HbaseTableInputFormat so not sure what you'd do, just an idea to look at)

In a project I've worked on we had to extend RecordReader and FileInputFormat to be able to process WC3 log files. The reason was to be sure each mapper had access to the headers, which are only at the top of the file and not with each chunk.

I haven't worked with extending those, and not sure about your exact situation, it might (or not) work to extend and implement the different functionality with RecordReader and/or FileInputFormat.

I, unfortunately, don't have the familiarity with the systems that I'd like to that would allow me to elaborate on it with further advice.
Hopefully what I've said points you more towards the right direction. :)

Upvotes: 1

Cyberax

Reputation: 1737

I don't think that's possible without gross hacks with Partitioner. Just reduce your Hbase tables first to collapse multiple rows into one row which is later used to construct your business objects.

Upvotes: 0

Any idea on how to write an Hadoop InputFormat / OutputFormat for Hbase

Answers (2)

Related Questions