Yuhao
Yuhao

Reputation: 1610

How does Hadoop MapReduce deal with the same key/value pair before mapper and reducer?

Will it leave one remained and delete the others, or pass them all into mapper and reducer?

Upvotes: 0

Views: 2734

Answers (3)

Mosab Shaheen
Mosab Shaheen

Reputation: 1174

The MapReduce Model will read them independently on the Map phase and grouping by the Key in the Reduce phase as @saurabh mentioned

Upvotes: 0

saurabh shashank
saurabh shashank

Reputation: 1353

Same KeyValue Pair


Map Phase

Since Key-Value pairs are independent of each other , so Mapper will never look/know for a identical Key-Value Pair

ex:

key  value
1       2
1       2
2       5
3       19


map(k,v)
{
 emit(k,v)
}

emit: 1,2 1,2 2,5 3,19


Reduce Phase

Identical Key-Value pair are handled by sorting the value on Key , so the value associated with the Key is ignored ,so each value is treated as unique.

ex:

key    value 
1      {2,2}
2      {5}
3      {19}

Upvotes: 2

Praveen Sripati
Praveen Sripati

Reputation: 33555

Hadoop framework will not ignore/delete any duplicate KV pairs. Any ignoring/modifications to the KV pairs have to be done in the user defined map and reduce functions.

The frameworks reads the input data and invokes the user defined map function with the input data as KV pairs, the map function emits KV pairs after some processing. These intermediate KV pairs are sorted/merged and the user defined reducer function is invoked again and again for each key, the reduce function will again emit KV pairs.

Would suggest to get the Hadoop: The Definitive Guide, 3rd Edition for a better clarity on MapReduce and Hadoop.

Upvotes: 2

Related Questions