kishorer747
kishorer747

Reputation: 820

How to get Mapreduce output in a single file instead of multiple files in Hadoop Cluster on Google Cloud?

When i am running my jar on local hadoop multi-node cluster, I can see output the reducer output and is a single file every job.

But when i run the same jar on Google Cloud, I get multiple output files (part-r-0000*). Instead what i need is all output written to a single file. How do I do that?

Upvotes: 1

Views: 593

Answers (1)

grasshopper
grasshopper

Reputation: 4068

Well one simple solution is to configure the job to run with only one reducer. It seems that on Google Cloud the default setting is different. See here for how to do that: Setting the Number of Reducers in a MapReduce job which is in an Oozie Workflow

Another way to deal with this is to have a concatenating script run at the end of your map reduce job that pieces together all the part-r files, ie something like

cat *part-r* >>alloutput

May be a bit more complex if you have headers and also you need to copy to local first.

Upvotes: 1

Related Questions