Reputation: 820
When i am running my jar on local hadoop multi-node cluster, I can see output the reducer output and is a single file every job.
But when i run the same jar on Google Cloud, I get multiple output files (part-r-0000*). Instead what i need is all output written to a single file. How do I do that?
Upvotes: 1
Views: 593
Reputation: 4068
Well one simple solution is to configure the job to run with only one reducer. It seems that on Google Cloud the default setting is different. See here for how to do that: Setting the Number of Reducers in a MapReduce job which is in an Oozie Workflow
Another way to deal with this is to have a concatenating script run at the end of your map reduce job that pieces together all the part-r files, ie something like
cat *part-r* >>alloutput
May be a bit more complex if you have headers and also you need to copy to local first.
Upvotes: 1