nomoa
nomoa

Reputation: 1052

Map Reduce algorithm design (mongodb prefered)

I'm quite new to map reduce design. I use mongodb as backend and map reduce engine.

On a simple dataset like :

day, value

where value is -1, 0 or 1, I want to add duration to each row where duration is the number of consecutive days a value is equal to 1 or -1.

Exemple input data set :

       day| value
2012-01-01|  1
2012-01-02|  1
2012-01-03|  1
2012-01-04| -1
2012-01-05| -1
2012-01-06|  0
2012-01-07|  1
2012-01-08|  1

Output should be :

       day| value | Duration
2012-01-01|  1    | 0
2012-01-02|  1    | 1
2012-01-03|  1    | 2
2012-01-04| -1    | 0
2012-01-05| -1    |-1
2012-01-06|  0    | 0
2012-01-07|  1    | 0
2012-01-08|  1    | 1

Is this feasible in a map reduce job?

Upvotes: 1

Views: 197

Answers (1)

cliffycheng
cliffycheng

Reputation: 381

Someone correct me if I'm wrong, but this doesn't look feasible for MapReduce. I'm not sure how MongoDB handles the partitioning of its input to its mappers, but if I remember correctly, tasks that rely on having previous knowledge of data outside of one mapper's chunk is not possible for MapReduce.

It's possible for MR to do this job within a certain chunk. Say that days 01/01 to 01/02 are sent to one mapper (from your example). Certainly you can get it to realize that the two days have the same value in a row.

However, what if another mapper gets days 01/03 to 01/04? This mapper won't know that days 1 and 2 before it have the same value as day 3 does, so it'll just output that its duration is 0. There's no way to get the data from a different mapper, as far I as can see.

It may just be better to do this just with straight-up java coding.

Upvotes: 1

Related Questions