parth
parth

Reputation: 674

MongoDB - Is it wise to use Aggregation instead of normal range query of Date range?

I'm trying to perform date range query in MongoDB. So let's say I want to find records which were created in 'May 2018'. Now as per knowledge we can achieve this by following ways.

  1. One is to do simple range query for createdAt. Ex. {createdAt: { $gte: ISODate('2018-05-01'), $lte: ISODate('2018-05-31')}}

  2. Another way is an aggregation db.collection.aggregate([{ $project: { year: { $year: '$createdAt' }, month: { $month: '$createdAt' }}, { $match: { year: 2018, month: 5}}])

So When to use aggregation if we can achieve the result by a simple query? Please give corrections if the above query or approach is incorrect or old. I'm testing this in Compass.

Thanks

Upvotes: 0

Views: 523

Answers (3)

Buzz Moschetti
Buzz Moschetti

Reputation: 7578

It has been my experience that very often shortly after you do a basic find(), your needs change and require the power of the pipeline. For example, consider the following docs:

{ "_id" : 0, "date" : ISODate("2018-01-10T00:00:00Z") }
{ "_id" : 1, "date" : ISODate("2018-01-07T00:00:00Z") }
{ "_id" : 2, "date" : ISODate("2018-01-03T00:00:00Z") }
{ "_id" : 3 }
{ "_id" : 4, "date" : ISODate("2018-01-20T00:00:00Z") }
{ "_id" : 5 }
{ "_id" : 6 }
{ "_id" : 7, "date" : ISODate("2018-01-18T00:00:00Z") }
{ "_id"  :8, "date" : ISODate("2018-01-10T00:00:00Z") }

We want to find all where date <= 2018-01-15:

db.foo.aggregate([
{$match: {"date": {$lte: new ISODate("2018-01-15")}} }
                ]);

{ "_id" : 0, "date" : ISODate("2018-01-10T00:00:00Z") }
{ "_id" : 1, "date" : ISODate("2018-01-07T00:00:00Z") }
{ "_id" : 2, "date" : ISODate("2018-01-03T00:00:00Z") }
{ "_id" : 8, "date" : ISODate("2018-01-10T00:00:00Z") }

Ooops! We want the blank dates too:

db.foo.aggregate([
{$match: {"$or": [ {"date": {$lte: new ISODate("2018-01-15")}}, {"date": {$exists: false}} ] }}
]);

{ "_id" : 0, "date" : ISODate("2018-01-10T00:00:00Z") }
{ "_id" : 1, "date" : ISODate("2018-01-07T00:00:00Z") }
{ "_id" : 2, "date" : ISODate("2018-01-03T00:00:00Z") }
{ "_id" : 3 }
{ "_id" : 5 }
{ "_id" : 6 }
{ "_id" : 8, "date" : ISODate("2018-01-10T00:00:00Z") }

And we want it sorted:

db.foo.aggregate([
{$match: {"$or": [ {"date": {$lte: new ISODate("2018-01-15")}}, {"date": {$exists: false}} ] }}
,{$sort: {"date":1}}
]);

{ "_id" : 3 }
{ "_id" : 5 }
{ "_id" : 6 }
{ "_id" : 2, "date" : ISODate("2018-01-03T00:00:00Z") }
{ "_id" : 1, "date" : ISODate("2018-01-07T00:00:00Z") }
{ "_id" : 0, "date" : ISODate("2018-01-10T00:00:00Z") }
{ "_id" : 8, "date" : ISODate("2018-01-10T00:00:00Z") }

Hmmm. But we want the blanks to appear at the END of the sorted list. So we overwrite the date field with itself or if null, a date really far out, which now gives us the sequence we seek:

db.foo.aggregate([
{$match: {"$or": [ {"date": {$lte: new ISODate("2018-01-15")}}, {"date": {$exists: false}} ] }}
,{$addFields: {"date": {$ifNull: [ "$date", new ISODate("3000-01-01")] }}}
,{$sort: {"date":1}}
]);

{ "_id" : 2, "date" : ISODate("2018-01-03T00:00:00Z") }
{ "_id" : 1, "date" : ISODate("2018-01-07T00:00:00Z") }
{ "_id" : 0, "date" : ISODate("2018-01-10T00:00:00Z") }
{ "_id" : 8, "date" : ISODate("2018-01-10T00:00:00Z") }
{ "_id" : 3, "date" : ISODate("3000-01-01T00:00:00Z") }
{ "_id" : 5, "date" : ISODate("3000-01-01T00:00:00Z") }
{ "_id" : 6, "date" : ISODate("3000-01-01T00:00:00Z") }

There are several variations on this theme such as doing an initial $project to create a common non-null sorting field but by putting $match right up front we can take advantage of indexes if they exist. The true power of the agg pipeline reveals itself when your documents have array data that requires querying and manipulation.

Upvotes: 1

dnickless
dnickless

Reputation: 10918

You don't need the $project stage (in fact, the way you wrote your projection your output documents will only contain the projected values and the _id field which may or may not be what you want). So what's left of is your aggregation query is just a $match. The query execution times can be assumed to be identical except for a minor penalty for the stage handling. Also, I personally expect the "normal" find() to pretty much converge into the aggregation framework internally if that hasn't even fully happened yet anyway. It just doesn't make sense to keep two ways to query data...

For clarity reasons, I would personally lean towards the find() version, though.

Upvotes: 1

Juan Bermudez
Juan Bermudez

Reputation: 430

If you can do it with a simple query, just keep it simple. There are many frameworks which give an easy support for querying but not to aggregations.

Aggregations are harder to write and to maintain. In some cases, you really need them (like when dealing with arrays or subdocuments) but if not, keep it with a find(). An example about when aggregations are needed:

MongoDB - Get latest non-null field value from documents with timestamp

Also, I am not sure if your aggregation code does the same that the find() code. If you check well, you are using $project stage first, and then you are matching. That means you are projecting all the objects of the DB and then matching among them because you didn't start the aggregation with a $match. This can be veeery slow if you have many objects.

Also, $match only uses indexes when it is in the first stage:

Place the $match as early in the aggregation pipeline as possible. Because $match limits the total number of documents in the aggregation pipeline, earlier $match operations minimize the amount of processing down the pipe. If you place a $match at the very beginning of a pipeline, the query can take advantage of indexes like any other db.collection.find() or db.collection.findOne().

Upvotes: 1

Related Questions