Reputation: 358
I am beginner with MapReduce, and currently reading the book Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer (link to PDF)
Anyways, the first example the book provides is a word counting algorithm, and I am having trouble understanding why the final output of the reducer is what it is. The example is on page number 23 of the text, figure 2.2. To my understanding X should be 6, Y should be 9, and Z should be 19.
Upvotes: 0
Views: 2189
Reputation: 1214
A paragraph later:
Final output is written to the distributed file system, one file per reducer. Words within each file will be sorted by alphabetical order, and each file will contain roughly the same number of words. The partitioner, which we discuss later in Section 2.4, controls the assignment of words to reducers. The output can be examined by the programmer or used as input to another MapReduce program.
As far as I can understand, each Reducer in the example writes its output to a different file. I agree this should be explained before the figure as specified in some comments.
Upvotes: 0
Reputation: 900
Your input file which is fed to mapper will look like: Rec1: a,1 Rec2: b,2 Rec3: c,3 Rec4: c,6 Rec5: a,5 Rec6: c,2 Rec7; b,7 Rec8: c,8
Records #1 and #2 & will be processed by Mapper #1. In this example it is assumed that above file is stored in 4 blocks. Rec #1 & Rec #2 ( & ) in 1st block. Rec #3 and Rec #4 in 2nd block ( & ). Rec #5 and Rec #6 ( & ) will be in 3rd block. Rec 7 and Rec8 will be stored in 4th block Rec7 & Rec8 ().
In Map-Reduce framework one mapper will be invoked for each input split (Logically same as block). Each Mapper will process all the record in an input split.
M1 will take input for ( & ) and emit a as key and 1 as value and emit b as key and 1 as a value. For M2 input is c,3 and c,6 and and it emits key as c and value as 3 and key as c and value is 6 and so on.
And then reducer will accepts those keys and performs its processing.
Hope this clarifies your question.
Upvotes: 1
Reputation: 900
I guess you got confused. Fig 2.2 is not an example of word count algorithm. It shows two level processing of Map-Reduce Frame Work described in section" MAPPERS AND REDUCERS". Fig 2.2 shows simplified view of Map-Reduce. Author wants to show how Map Reduce frame work words with this figure. And this shows how to find maximum of all the values associated with the key. That's y output is x=5 (max of 1,5) etc.
If you want to read word count algorithm via map-reduce please look at the :
class Mapper 2: method Map(docid a; doc d) 3: for all term t 2 doc d do 4: Emit(term t; count 1) 1: class Reducer 2: method Reduce(term t; counts [c1; c2; : : :]) 3: sum 0 4: for all count c 2 counts [c1; c2; : : :] do 5: sum sum + c 6: Emit(term t; count sum) Figure 2.3: Pseudo-code for the word count algorithm in MapReduce. The mapper emits an intermediate key-value pair for each word in a document. The reducer sums up all counts for each word.
May be confusion arises because picture is pasted first and its number is pasted at the bottom of picture.
Hope this helps !!
Upvotes: 0