Reputation: 28453
What is the point of map? For example, instead of the below, why not just use: reducer = (accum, x) => accum + (x + 2)
?
Or with mapper and reducer separate:
mapper = (x) => x + 2
reducer = (accum, y) => accum + y
So:
// x y
// 0 2
// 1 3
[0, 1].map(mapper).reduce(reducer, 0) // result == 5
Are there examples in "big data technologies" like Hadoop, where moving all the functionality into the reducer is undesirable / incurs some penalty that's avoided by having a separate mapper.
I can think of examples where knowing the initial value is actually required in the reducer
; making the use of a "purely map" mapper
function impossible or at least pointless as you'd have to be mapping to a value that contains the initial value, e.g. from mapper
, returning a tuple containing the initial value so that reducer
can access it:
mapper = (x) => [x, lookupValue1[x] * lookupValue2[x]]
reducer = (accum, y) => { accum[y[0]] = y[1]; return accum; }
// x y
// 'alex' ['alex', -41]
// 'chris' ['chris', 102]
['alex', 'chris'].map(mapper).reduce(reducer, {})
// result = { 'alex': -41, 'chris': 102 }
Upvotes: 0
Views: 77
Reputation: 5538
Think MapReduce
as a design pattern to efficiently process "suitable" data. By this, I mean to say two things:
1) MapReduce is not the efficient way to process all "type" of data. There can be certain type of data and processing steps; which can leverage HDFS
and distributed processing. MapReduce
is just a tool in that league which is best suited with certain algorithm.
2) Not all algorithm are suitable for mapreduce
. Because its a design pattern, it best suits for certain algorithms which are inline with a certain design. That is why mapreduce
core library allow you to skip mapper
(using identity Mapping) or reducer
(just by setting number of reducer as zero). You are allowed to skip one one more phases of mapreduce
according to your need.
Keeping these two point in center, if you understand how map-combine - sort + shuffle - reduce
works; it can help you to implement an algorithm which is more efficient than using any other tool. At the same time, if your data and algorithm is really not a 'fit' to mapreduce, you could end up with a highly inefficient mapreduce
program.
If you wish to research on significance of mapper in mapreduce
, just study the wordcount
program example (comes bundled with mapreduce
). Try implementing it with/without mapper
(or reducer
or mapreduce
altogether) and benchmark the performance. I hope you would find the answer.
Upvotes: 1