Lowblow
Lowblow

Reputation: 109

Is it possible to output multiple values under the same key in MapReduce?

If I am running a MapReduce job am I allowed to have:

context.write(key, value1)
context.write(key, value2)
context.write(key, value3) ....

in my mapper function? Would this behave like the Map class in Java and overwrite the pre-existing values?

Upvotes: 2

Views: 2927

Answers (2)

vefthym
vefthym

Reputation: 7462

Yes, you can have multiple values for the same key. The map function in MapReduce is not like the Map structure in Java. You could however think of it as a Multimap, or like a hash table, if this analogy is easier for you: You can put multiple values to the same bucket.

See an example in the following WordCount program* (see the second mapper, emitting the key C twice). Those key-value pairs will end up in the same bucket (reduce task):

enter image description here

However, I there is a catch in that: you should usually try to avoid this situation of reduce-side joins, when map-side joins are applicable, as in your case, for efficiency reasons. If, for example, you could emit (key, [value1,value2,value3,...]) in the mapper, this would usually be faster, because less data need to be transferred and joined. Since you already know that those three values will end up in the same reducer, you could process them like the reducer would, or do some sort of pre-processing that will help the reducer perform less computations (or you could alternatively use a combiner for this purpose). In the previous figure, it would be faster to emit (C,2) in the first place from the mapper.

*The reduce phase is not depicted correctly in the figure, but this is irrelevant with the question.

Upvotes: 1

Binary Nerd
Binary Nerd

Reputation: 13937

Yes, you can do this. You are effectively emitting new key/value pairs each time you call context.write(), so each call is independant of the last, thus it isn't really comparable to a Map.

Upvotes: 0

Related Questions