Reputation: 2790
How can I use take the input set
{worker-id:1 name:john supervisor-id:3}
{worker-id:2 name:jane supervisor-id:3}
{worker-id:3 name:bob}
and produce the output set
{worker-id:1 name:john supervisor-name:bob}
{worker-id:2 name:jane supervisor-name:bob}
using a "pure" map-reduce framework, i.e. one with only a map phase and a reduce phase but without any extra feature such as CouchDB's lookup?
Upvotes: 3
Views: 1551
Reputation: 46408
Exact details will depend on your map-reduce framework. But the idea is this. In your map phase, you emit two types of key/value pairs. (1, {name:john type:boss})
and (3, {worker-id:1 name:john type:worker})
. In your reduce phase you get all of the values for the key grouped together. If there is a record of type boss in there, then you remove that record and populate the supervisor-name of the other records. If there isn't, then you drop those records on the floor.
Basically you use the fact that data gets grouped by key then processed together in the reduce to do the join.
(In some map-reduce implementations you incrementally get key/value pairs put together in the reduce. In those implementations you can't throw away records that don't have a boss already, so you wind up needing to map-reduce-reduce for that final filtering step.)
Upvotes: 3
Reputation: 1474
There is Only one input file or more?? I mean, is it possible a case which we have a file that one of its worker-id have a supervisor-id which its descriptions(name of that supervisor-id) be in another file??
Upvotes: 0