mjhinch
mjhinch

Reputation: 791

Aggregate and Sum Data from mutliple MongoDB Collections filtered by date range

I have data across three collections and need to produce a data set which aggregates data from these collections, and filters by a date range.

The collections are:

db.games
{
 _id : ObjectId,
 startTime : MongoDateTime
}

db.entries
{
 player_id : ObjectId,   // refers to db.players['_id']
 game_id : ObjectId      // refers to db.games['_id']
}

db.players
{
 _id : ObjectId,
 screen_name,
 email
}

I want to return a collection which is number of entries by player for games within a specified range. Where the output should look like:

output
{
  player_id,
  screen_name,
  email,
  sum_entries
}

I think I need to start by creating a collection of games within the date range, combined with all the entries and then aggregate over count of entries, and finally output collection with the player data, it's seems a lot of steps and I'm not sure how to go about this.

Upvotes: 0

Views: 157

Answers (1)

Philipp
Philipp

Reputation: 69683

The reason why you have these problems is because you try to use MongoDB like a relational database, not like a document-oriented database. Normalizing your data over many collections is often counter-productive, because MongoDB can not perform any JOIN-operations. MongoDB works much better when you have nested documents which embed other objects in arrays instead of referencing them. A better way to organize that data in MongoDB would be to either have each game have an array of players which took part in it or to have an array in each player with the games they took part in. It's also not necessarily a mistake to have some redundant additional data in these arrays, like the names and not just the ID's.

But now you have the problem, so let's see how we can deal with it.

As I said, MongoDB doesn't do JOINs. There is no way to access data from more than one collection at a time.

One thing you can do is solving the problem programmatically. Create a program which fetches all players, then all entries for each player, and then the games referenced by the entries where startTimematches.

Another thing you could try is MapReduce. MapReduce can be used to append results to another collection. You could try to use one MapReduce job for each of the relevant collections into one and then query the resulting collection.

Upvotes: 1

Related Questions