Hadoop map/reduce structure

Question

I'd like to know if I can specify multiple "keys" for mapper/reducer; say for example that I have a class that is the following

 class A {
     region
     name
     age
     .....
     more attributes
 }

I want to extract information based on three different keys: the age, region and name.

for example having age as a key:

< age, attributes related to age >

then having the name as a key:

< name, attributes related to name >

and similarly for the region; my question now is this do I have to create different map/reduce jobs supplying a different key for each one? or I can do that (safely) in a single map/reduce job?

Venkat · Accepted Answer

You can do this in a single Map Reduce job also.

Your Mapper would be reading the data. Assuming this is in serialized format with the structure similar to your class (custom writable).

From the mapper you can collect output in form of a complex key with 2 parts - What you are collecting:value e.g. Age:18. This could be a Text or a custom writable again.

Based on your use case you may need to use a partitioner to ensure all the keys with Age go to single reducer and the ones with name go to another reducer. Without a partitioner All keys with Age:18 will go to same reducer.

Hadoop map/reduce structure

Answers (1)

Related Questions