Is it possible to perform any mapreduce task with a single reducer?

Question

What if the output is so big that it does not fit into the reducers RAM? For example a sorting task. In this case, output is as big as the input. If you use a single reducer then all the data do not fit into the RAM. How does the sorting take place then?

Roy · Accepted Answer

I think I have got the answer. Yes, it is possible to perform any map task in a single reducer, even if the data are bigger than the memory of reduce. In the shuffle phase reducer copies the data from mapper to reducer's memory and sorts it until it spills. Once it spills the memory that part of data is stored in reducers local disk and it starts to get the new values. Once it spills again it merges the new data with the previously stored file. The merged file maintains the sorted fashion (Probably using external merge sort). Once the shuffling is done the intermediate key,value pairs are stored in a sorted manner. Then the reduce task is performed on that data. As the data are sorted it is easy to do the aggregation in memory by taking a chunk of the data at a time in memory.

Is it possible to perform any mapreduce task with a single reducer?

Answers (1)

Related Questions