Reputation: 1848
I have a question about map/reduce and whether it would be a good fit for a task i'm trying to accomplish. I'm still in the design phase, so I can alter the structure of the data if needed.
Imagine I had two documents like this:
{ "_id" : ObjectId( "50231b4f8be6e5f1f2e7e24g" ),
"customer" : "50bac36bb4e54678170002e6",
"display" : "Motorola Canopy 100",
"description" : "a radio" }
{ "_id" : ObjectId( "50232fac8be6e4f1f2e7e259" ),
"display" : "Motorola POE",
"description" : "a power injector",
"device" : "50231b4f8be6e4f1f2e7e24g" }
On one document, there is a customer id reference ("customer" : "50bac36bb4e54678170002e6") and on another document there is a device id reference ("device" : "50231b4f8be6e4f1f2e7e24g"). Both of these documents represent a device of some kind. The device will either be related (aka, assigned) to a customer or related to another device. For example, a customer will get a radio and it will be assigned as such, but the power injector will be assigned to the radio itself, not the customer. I did this so I could better understand operational dependencies between devices. Maintaining this tiered association pattern is useful and I would like to keep it if possible.
I currently run a series of queries that first finds all of the devices and then for each device found, runs another query looking for associated devices. It's a recursive operation that can run as deep as there are associations to any given device.
I'm curious to see if this problem would be a good fit for a map/reduce operation. Should I stick with my current recursive op or is this a good/perfect fit for map/reduce?
Further clarification from questions posed in comments:
Upvotes: 1
Views: 190
Reputation: 16685
MapReduce works best with documents that don't have dependent relationships. If your data structure is just a directed acyclic graph (DAG) where each customer is a root node, and each device might have multiple parent nodes. Map/Reduce isn't a great computational fit for most DAG operations, because of the dependent relationships.
However since your data is tree-like (no multiple parents) you can add an array of ancestors as in the MongoDB docs example. In this case MR might be a good fit for some operations.
If I understand your query strategy, I'm concerned about the efficiency of doing lots of query operations in the inner loops of each recursion level. To help this along, ideally the "device" should be an ObjectId type rather than a string, and it should be indexed. An alternative to indexing could also be to store the children ObjectId's as child references per this MongoDB docs example. In either case it's possible that running a MR or a "$group" aggregation operation for each recursion level could help, but I suspect that in most cases it would just shift the computation from the DB client to the DB server.
Upvotes: 3