Reputation: 674
I'm trying to perform date range query in MongoDB. So let's say I want to find records which were created in 'May 2018'. Now as per knowledge we can achieve this by following ways.
One is to do simple range query for createdAt. Ex.
{createdAt: { $gte: ISODate('2018-05-01'), $lte: ISODate('2018-05-31')}}
Another way is an aggregation
db.collection.aggregate([{ $project: { year: { $year: '$createdAt' }, month: { $month: '$createdAt' }},
{ $match: { year: 2018, month: 5}}])
So When to use aggregation if we can achieve the result by a simple query? Please give corrections if the above query or approach is incorrect or old. I'm testing this in Compass.
Thanks
Upvotes: 0
Views: 523
Reputation: 7578
It has been my experience that very often shortly after you do a basic find()
, your needs change and require the power of the pipeline. For example, consider the following docs:
{ "_id" : 0, "date" : ISODate("2018-01-10T00:00:00Z") }
{ "_id" : 1, "date" : ISODate("2018-01-07T00:00:00Z") }
{ "_id" : 2, "date" : ISODate("2018-01-03T00:00:00Z") }
{ "_id" : 3 }
{ "_id" : 4, "date" : ISODate("2018-01-20T00:00:00Z") }
{ "_id" : 5 }
{ "_id" : 6 }
{ "_id" : 7, "date" : ISODate("2018-01-18T00:00:00Z") }
{ "_id" :8, "date" : ISODate("2018-01-10T00:00:00Z") }
We want to find all where date <= 2018-01-15:
db.foo.aggregate([
{$match: {"date": {$lte: new ISODate("2018-01-15")}} }
]);
{ "_id" : 0, "date" : ISODate("2018-01-10T00:00:00Z") }
{ "_id" : 1, "date" : ISODate("2018-01-07T00:00:00Z") }
{ "_id" : 2, "date" : ISODate("2018-01-03T00:00:00Z") }
{ "_id" : 8, "date" : ISODate("2018-01-10T00:00:00Z") }
Ooops! We want the blank dates too:
db.foo.aggregate([
{$match: {"$or": [ {"date": {$lte: new ISODate("2018-01-15")}}, {"date": {$exists: false}} ] }}
]);
{ "_id" : 0, "date" : ISODate("2018-01-10T00:00:00Z") }
{ "_id" : 1, "date" : ISODate("2018-01-07T00:00:00Z") }
{ "_id" : 2, "date" : ISODate("2018-01-03T00:00:00Z") }
{ "_id" : 3 }
{ "_id" : 5 }
{ "_id" : 6 }
{ "_id" : 8, "date" : ISODate("2018-01-10T00:00:00Z") }
And we want it sorted:
db.foo.aggregate([
{$match: {"$or": [ {"date": {$lte: new ISODate("2018-01-15")}}, {"date": {$exists: false}} ] }}
,{$sort: {"date":1}}
]);
{ "_id" : 3 }
{ "_id" : 5 }
{ "_id" : 6 }
{ "_id" : 2, "date" : ISODate("2018-01-03T00:00:00Z") }
{ "_id" : 1, "date" : ISODate("2018-01-07T00:00:00Z") }
{ "_id" : 0, "date" : ISODate("2018-01-10T00:00:00Z") }
{ "_id" : 8, "date" : ISODate("2018-01-10T00:00:00Z") }
Hmmm. But we want the blanks to appear at the END of the sorted list. So we overwrite the date
field with itself or if null, a date really far out, which now gives us the sequence we seek:
db.foo.aggregate([
{$match: {"$or": [ {"date": {$lte: new ISODate("2018-01-15")}}, {"date": {$exists: false}} ] }}
,{$addFields: {"date": {$ifNull: [ "$date", new ISODate("3000-01-01")] }}}
,{$sort: {"date":1}}
]);
{ "_id" : 2, "date" : ISODate("2018-01-03T00:00:00Z") }
{ "_id" : 1, "date" : ISODate("2018-01-07T00:00:00Z") }
{ "_id" : 0, "date" : ISODate("2018-01-10T00:00:00Z") }
{ "_id" : 8, "date" : ISODate("2018-01-10T00:00:00Z") }
{ "_id" : 3, "date" : ISODate("3000-01-01T00:00:00Z") }
{ "_id" : 5, "date" : ISODate("3000-01-01T00:00:00Z") }
{ "_id" : 6, "date" : ISODate("3000-01-01T00:00:00Z") }
There are several variations on this theme such as doing an initial $project
to create a common non-null sorting field but by putting $match
right up front we can take advantage of indexes if they exist. The true power of the agg pipeline reveals itself when your documents have array data that requires querying and manipulation.
Upvotes: 1
Reputation: 10918
You don't need the $project
stage (in fact, the way you wrote your projection your output documents will only contain the projected values and the _id
field which may or may not be what you want). So what's left of is your aggregation query is just a $match
. The query execution times can be assumed to be identical except for a minor penalty for the stage handling. Also, I personally expect the "normal" find()
to pretty much converge into the aggregation framework internally if that hasn't even fully happened yet anyway. It just doesn't make sense to keep two ways to query data...
For clarity reasons, I would personally lean towards the find()
version, though.
Upvotes: 1
Reputation: 430
If you can do it with a simple query, just keep it simple. There are many frameworks which give an easy support for querying but not to aggregations.
Aggregations are harder to write and to maintain. In some cases, you really need them (like when dealing with arrays or subdocuments) but if not, keep it with a find(). An example about when aggregations are needed:
MongoDB - Get latest non-null field value from documents with timestamp
Also, I am not sure if your aggregation code does the same that the find() code. If you check well, you are using $project stage first, and then you are matching. That means you are projecting all the objects of the DB and then matching among them because you didn't start the aggregation with a $match. This can be veeery slow if you have many objects.
Also, $match only uses indexes when it is in the first stage:
Place the $match as early in the aggregation pipeline as possible. Because $match limits the total number of documents in the aggregation pipeline, earlier $match operations minimize the amount of processing down the pipe. If you place a $match at the very beginning of a pipeline, the query can take advantage of indexes like any other db.collection.find() or db.collection.findOne().
Upvotes: 1