Reputation: 1610
Will it leave one remained and delete the others, or pass them all into mapper and reducer?
Upvotes: 0
Views: 2734
Reputation: 1174
The MapReduce Model will read them independently on the Map phase and grouping by the Key in the Reduce phase as @saurabh mentioned
Upvotes: 0
Reputation: 1353
Same KeyValue Pair
Since Key-Value pairs are independent of each other , so Mapper will never look/know for a identical Key-Value Pair
ex:
key value
1 2
1 2
2 5
3 19
map(k,v)
{
emit(k,v)
}
emit: 1,2 1,2 2,5 3,19
Identical Key-Value pair are handled by sorting the value on Key , so the value associated with the Key is ignored ,so each value is treated as unique.
ex:
key value
1 {2,2}
2 {5}
3 {19}
Upvotes: 2
Reputation: 33555
Hadoop framework will not ignore/delete any duplicate KV pairs. Any ignoring/modifications to the KV pairs have to be done in the user defined map and reduce functions.
The frameworks reads the input data and invokes the user defined map function with the input data as KV pairs, the map function emits KV pairs after some processing. These intermediate KV pairs are sorted/merged and the user defined reducer function is invoked again and again for each key, the reduce function will again emit KV pairs.
Would suggest to get the Hadoop: The Definitive Guide, 3rd Edition for a better clarity on MapReduce and Hadoop.
Upvotes: 2