Reputation: 740
I am new to MongoDB and I have difficulties implementing a solution in it. Consider a case where I have two collections: a client and sales collection with such designs
Client
==========
id
full name
mobile
gender
region
emp_status
occupation
religion
Sales
===========
id
client_id //this would be a DBRef
trans_date //date time value
products //an array of collections of product sold in the form {product_code, description, units, unit price, amount}
total sales
Now there is a requirement to develop another collection for analytical queries where the following questions can be answered
I considered implementing a very denormalized collection to create a flat and wide collection of the properties of the sales and client collection so that I can use map-reduce to further answer the questions. In RDBMS, an aggregation back by a join would answer these question but I am at loss to how to make Map-Reduce or Agregation help out.
Questions: How do I implement Map-Reduce to map across 2 collections? Is it possible to chain MapReduce operations?
Regards.
Upvotes: 1
Views: 658
Reputation: 69683
MongoDB does not do JOINs - period!
MapReduce always runs on a single collection. You can not have a single MapReduce job which selects from more than one collection. The same applies to aggregation.
When you want to do some data-mining (not MongoDBs strongest suit), you could create a denormalized collection of all Sales
with the corresponding Client
object embedded. You will have to write a little program or script which iterates over all clients and
Sales
documents for the clinetClient
into each documentWhen your Client
document is small and doesn't change often, you might consider to always embed it into each Sales
. This means that you will have redundant data, which looks very evil from the viewpoint of a seasoned RDB veteran. But remember that MongoDB is not a relational database, so you should not apply all RDBMS dogmas unreflected. The "no redundancy" rule of database normalization is only practicable when JOINs are relatively inexpensive and painless, which isn't the case with MongoDB. Besides, sometimes you might want redundancy to ensure data persistence. When you want to know your historical development of sales by region, you want to know the region where the customer resided when they bought the product, not where they reside now. When each Sale
only references the current Client
document, that information is lost. Sure, you can solve this with separate Address
documents which have date-ranges, but that would make it even more complicated.
Another option would be to embed an array of Sales
in each Client
. However, MongoDB doesn't like documents which grow over time, so when your clients tend to return often, this might result in sub-par write-performance.
Upvotes: 2