barath
barath

Reputation: 842

Can there be a scenario in hadoop where there'll be only 1 map task and 0 reduce tasks?

I know that the result of a Map-phase is an intermediate result which would be the input for a reduce-phase.

Recently, I read in hadoop definitive guide that "results of Map-tasks are stored in disk (i.e. not in HDFS, as they are an intermediate result) and only the results of Reduce-phase are stored in HDFS ".

So, with the above sentence my understanding is that if there is a Map-task then there should be a reduce task also. Because, as the result of a map-task is just an intermediate result and to store these result to HDFS then there should be a reduce-task. Is my understanding correct?

If my understanding is wrong then can anyone give me a scenario where there can be 1 map task and 0 reduce tasks?

Upvotes: 0

Views: 596

Answers (3)

Kfactor21
Kfactor21

Reputation: 412

For the benefit of the future readers: In the hadoop eco system(2.7.1 - Tez execution framework) i work there are extract jobs reading data out of Flatfile, Databases and CloudApps like salesforce into HDFS which do not perform any transformation to data have only Map tasks and no reduce tasks. And there is no enforcement of default reducers in the settings.

Upvotes: 0

Shailvi
Shailvi

Reputation: 105

Yes, when there is zero reducer the output of map task is not the intermediate but the final output. No shuffling, partitioning will take place in this case. Pure output from mapper is written to disk.

Upvotes: 0

Ranga Vure
Ranga Vure

Reputation: 1932

In Map Reduce, not all the time reducers phase required. In transformations, where input needs to be transformed reducer is not required.

In those scenarios, no of reducers will be defined as 0, or -reducer option will be set as None. In these cases mapper output will be stored in HDFS.

Upvotes: 0

Related Questions