Sneaky Wombat
Sneaky Wombat

Reputation: 1848

map reduce suitability

I have a question about map/reduce and whether it would be a good fit for a task i'm trying to accomplish. I'm still in the design phase, so I can alter the structure of the data if needed.

Imagine I had two documents like this:

{ "_id" : ObjectId( "50231b4f8be6e5f1f2e7e24g" ),
  "customer" : "50bac36bb4e54678170002e6",
  "display" : "Motorola Canopy 100",
  "description" : "a radio" }

{ "_id" : ObjectId( "50232fac8be6e4f1f2e7e259" ),
  "display" : "Motorola POE",
  "description" : "a power injector",
  "device" : "50231b4f8be6e4f1f2e7e24g" }

On one document, there is a customer id reference ("customer" : "50bac36bb4e54678170002e6") and on another document there is a device id reference ("device" : "50231b4f8be6e4f1f2e7e24g"). Both of these documents represent a device of some kind. The device will either be related (aka, assigned) to a customer or related to another device. For example, a customer will get a radio and it will be assigned as such, but the power injector will be assigned to the radio itself, not the customer. I did this so I could better understand operational dependencies between devices. Maintaining this tiered association pattern is useful and I would like to keep it if possible.

I currently run a series of queries that first finds all of the devices and then for each device found, runs another query looking for associated devices. It's a recursive operation that can run as deep as there are associations to any given device.

I'm curious to see if this problem would be a good fit for a map/reduce operation. Should I stick with my current recursive op or is this a good/perfect fit for map/reduce?

Edit 1

Further clarification from questions posed in comments:

  1. Is it true that all of these documents are in the same collection? Yes There is a single "device" collection. The customer IDs though are from a different collection. For the purposes of this query though, I only use that ID to group the results from the device collection. No "joins", to borrow an sql term, are done, nor are they needed. Think of the customer key and the device key as attributes that should be sorted on.
  2. Also you say that you run queries to "find all of the devices". Do you mean "finds all of the devices given a customer id?" Yes: the first query executed is based off a key, the customer id. It's basically like this: db.devices.find({"customer" : "50bac36bb4e54678170002e6"}). I then take that result and iterate through it, using straight javascript to sort through the hash and fire off subqueries. I then am left with some new queries to run, basically all of the _ids captured with the first query are then queried with something like this: db.devices.find({_id:"50bac36bb4e54678170002e6"}). That query will yield new documents not discovered during the first query. I continue this sub query step until db.devices.find doesn't yield any new results. I then run more javascript on those results and merge them into the first, building a hash tree. That's how I know how to build the tree.
  3. Is it true that every device flows "up" to one customer? Yes
  4. Can devices potentially belong to more than one parent device or customer? No

Upvotes: 1

Views: 190

Answers (2)

mjhm
mjhm

Reputation: 16685

MapReduce works best with documents that don't have dependent relationships. If your data structure is just a directed acyclic graph (DAG) where each customer is a root node, and each device might have multiple parent nodes. Map/Reduce isn't a great computational fit for most DAG operations, because of the dependent relationships.

However since your data is tree-like (no multiple parents) you can add an array of ancestors as in the MongoDB docs example. In this case MR might be a good fit for some operations.

If I understand your query strategy, I'm concerned about the efficiency of doing lots of query operations in the inner loops of each recursion level. To help this along, ideally the "device" should be an ObjectId type rather than a string, and it should be indexed. An alternative to indexing could also be to store the children ObjectId's as child references per this MongoDB docs example. In either case it's possible that running a MR or a "$group" aggregation operation for each recursion level could help, but I suspect that in most cases it would just shift the computation from the DB client to the DB server.

Upvotes: 3

coderLMN
coderLMN

Reputation: 3076

I don't think map/reduce is a good solution here, because the data are interdependent, which render concurrency ineffective. Alternatively, you can use dbRef to embed related device so that it can be accessed directly from the related device document.

Upvotes: 1

Related Questions