Reputation: 1320
{
"_id" : ObjectId("59b7d232cb4ddc345c3bcef4"),
"user_device_id" : ObjectId("59b7d21017c8a62f3a40c0bf"),
"generated_at" : ISODate("2017-09-12T12:24:48.182Z"),
"game_id" : ObjectId("59b683d4dd30770001513c75"),
"device_type" : "iPhone 8 android sdk",
"session_duration": 1000, /* in milliseconds */
"device_os" : "iOS android sdk",
"event_type" : "level_increase" /* new_user, game_session, ....*/
} // events collection
{
"_id" : ObjectId("59b7d21017c8a62f3a40c0bf"),
"generated_at" : ISODate("2017-09-12T12:24:48.182Z"),
"game_id" : ObjectId("59b683d4dd30770001513c75"),
"device_type" : "iPhone 8 android sdk",
"device_os" : "iOS android sdk"
} // user_devices collection
I have a 'events' collection which has the similar above structure. I am having mongo as a big data analytics platform so to get the optimum speed for aggregation query in milliseconds or if not then in couple of seconds which fields should I index any idea. The total amount of documents will probably be around 10-100s of billions.
Aggregation query will be basically between two dates which are user retention per game, average user session per game, total user per game, query based on device_type or device_os per game.
ATM I have date field indexed.
Here is the example of a Day 1 user retention query:
UserDevice.aggregate(
[
{$match: {generated_at: {$gte: first_date, $lt: end_date}, game_id: "some game_id"}},
{
$lookup: {
from: "events",
localField: "_id",
foreignField: "user_device_id",
as: "event_docs"
}
},
{
$group: {
_id: { day: { $dayOfMonth: {$add: ["$generated_at", 1000*3600*24*1]}}, month: {$month: {$add: ["$generated_at", 1000*3600*24*1]}}, year: { $year: {$add: ["$generated_at", 1000*3600*24*1]} } }, total_users: {$sum: 1},
returned_users: {
$sum: {
$cond: { if: { $eq: [
{
$filter: {
input: "$event_docs",
as: "ed",
cond: {
$and: [
{$eq: [{ $dayOfMonth: {$add: ["$generated_at", 1000*3600*24*1]}}, { $dayOfMonth: "$$ed.generated_at" }]},
{$eq: [{ $month: {$add: ["$generated_at", 1000*3600*24*1]}}, { $month: "$$ed.generated_at" }]},
{$eq: [{ $year: {$add: ["$generated_at", 1000*3600*24*1]}}, { $year: "$$ed.generated_at" }]},
{$ne: ["$$ed.event_type", "new_user"]}
]
}
}
}, []
]}, then: 0, else: 1
}
}
}
}
}, {
$sort: {"_id.year": 1, "_id.month": 1, "_id.day": 1}
}
]).exec(function(err, results) {
if (err) throw err;
var latency = Date.now() - startTime;
console.log("RETENTION RESULTS", JSON.stringify(results), "| latency:", latency,"ms");
});
Mongo Version: 3.4.7
Upvotes: 0
Views: 192
Reputation: 6385
First of all indexes are used in aggregation in full only from MongoDB 3.2.
Changed in version 3.2: Starting in MongoDB 3.2, indexes can cover an aggregation pipeline. In MongoDB 2.6 and 3.0, indexes could not cover an aggregation pipeline since even when the pipeline uses an index, aggregation still requires access to the actual documents. https://docs.mongodb.com/manual/core/aggregation-pipeline/#pipeline-operators-and-indexes
Make sure you have this version or newer.
In general, you can create indexes for all fields that are in $match
section of your queries.
I would also suggest go through ‘Aggregation Pipeline Optimization’ section of mongodb docs. It might be helpful
For ex. you can use explain
option to see how query is executed and understand if it can be optimized.
Upvotes: 1