Reputation: 1616
I have a mapreduce job whose role is to split my input file into two files according to a given criterion.
I am currently using Hadoop r0.20.203 because it is the current stable version
This version offers two APIs :
As you can imagine, I am using the new API, and my problem is that Hadoop r0.20.203 does not offer any MultipleOutput
formats in the new API.
Hadoop 0.20.203 stills offers MultipleTextOutputFormat
and MultipleTextOutputs
(which are both suitable for my case) in the old API. Moreover, the newer Hadoop 0.22 offers MultipleOutputs
in the new API.
I see four solutions to my problem :
What would you do if you were me ?
Upvotes: 0
Views: 1790
Reputation: 51369
Because so much code relies on it, and because the new API (as you have discovered) was never fully implemented, they are probably un-deprecating the old API in a future version of Hadoop. I'd use the old API and not worry about it.
See http://www.cloudera.com/blog/2010/08/what%E2%80%99s-new-in-apache-hadoop-0-21/
Upvotes: 1
Reputation: 20969
Why don't you put the source code in your project and use it?
It should be compatible with r0.20.203, actually I don't see classes which should not be available in the older version.
And there is really nothing magic about it, it just setup's several record writers for each configured output (type and stuff). I bet that you could have written your own in the time of formulating the question
Upvotes: 0