Reputation: 1052
I'm quite new to map reduce design. I use mongodb as backend and map reduce engine.
On a simple dataset like :
day, value
where value is -1, 0 or 1, I want to add duration to each row where duration is the number of consecutive days a value is equal to 1 or -1.
Exemple input data set :
day| value
2012-01-01| 1
2012-01-02| 1
2012-01-03| 1
2012-01-04| -1
2012-01-05| -1
2012-01-06| 0
2012-01-07| 1
2012-01-08| 1
Output should be :
day| value | Duration
2012-01-01| 1 | 0
2012-01-02| 1 | 1
2012-01-03| 1 | 2
2012-01-04| -1 | 0
2012-01-05| -1 |-1
2012-01-06| 0 | 0
2012-01-07| 1 | 0
2012-01-08| 1 | 1
Is this feasible in a map reduce job?
Upvotes: 1
Views: 197
Reputation: 381
Someone correct me if I'm wrong, but this doesn't look feasible for MapReduce. I'm not sure how MongoDB handles the partitioning of its input to its mappers, but if I remember correctly, tasks that rely on having previous knowledge of data outside of one mapper's chunk is not possible for MapReduce.
It's possible for MR to do this job within a certain chunk. Say that days 01/01 to 01/02 are sent to one mapper (from your example). Certainly you can get it to realize that the two days have the same value in a row.
However, what if another mapper gets days 01/03 to 01/04? This mapper won't know that days 1 and 2 before it have the same value as day 3 does, so it'll just output that its duration is 0. There's no way to get the data from a different mapper, as far I as can see.
It may just be better to do this just with straight-up java coding.
Upvotes: 1