Reputation: 81
I'm trying to write a MapReduce job that parses a CSV file, store data in HBase and do a reduce function in one go. Ideally I would like
I know how to do 1 and 2 using HBase MultiTableOutputFormat
, but unsure how to do 3 and 4.
Any pointers on how to do this is much appreciated.
I've a few thoughts on how to do this:
For 1 and 2 I would have ImmutableBytesWriteable
as key and MultiTableOutputFormat
takes care of storing from Mapper. But for 3 I would like the key to be Text.
For #4, should I do this in the Mapper by
Upvotes: 1
Views: 677
Reputation: 2682
mapper reads csv by setting KeyValueTextInputFormat .
In mapper code , have some logic to distinguish between good and bad records and put them in Hbase by using Put(Hbase Api calls ) .
In mapper setup a handler for hbaseTable can be intialized .
The good record can be passed to reducer using context.write(key,value) and collected in the reducer
Upvotes: 2