Reputation: 17362
I need to find all datasets in my mongoDB with an expired date value. Expired means, that the last array element timestamp is older then the current timestamp plus a defined interval (which is defined by a category)
Every dataset has a field
like this
{
"field" : [
{
"category" : 1,
"date" : ISODate("2019-03-01T12:00:00.464Z")
},
{
"category" : 1,
"date" : ISODate("2019-03-01T14:52:50.464Z")
}
]
}
The category defines a time interval. For example 'category 1' stands for 90 minutes, 'category 2' for 120 minutes.
Now I need to get every dataset with a date value which is expired, which means the last array element has a value which is older then 90 minutes before the current timestamp.
Something like
Content.find({ 'field.$.date': { $gt: new Date() } })
But with that attempt I've two problems:
Upvotes: 1
Views: 105
Reputation: 151200
Let's break down the problem into parts.
Part 1: Logical and Fast
A quick perusal of MongoDB query operators related to arrays should tell you that you can in fact always query an array element based on the index position. This is very simple to do for the "first" array element since that position is always 0
:
{ "field.0.date": { "$lt": new Date("2019-03-01T10:30:00.464Z") } }
Logically the "last" position would be -1
, but you cannot actually use that value in notation of this form with MongoDB as it would be considered invalid.
However what you can do here instead is add new items to the array in a way so that rather than appending to the end of the array, you actually prepend to the beginning of the array. This means your array content is essentially "reversed" and it's then easy to access as shown above. This is what the $position
modifier to $push
does for you:
collection.updateOne(
{ "_id": documentId },
{
"$push": {
"field": {
"$each": [{ "category": 1, "date": new Date("2019-03-02") }],
"$position": 0
}
}
}
)
So that means newly added items go to the beginning rather than the end. That may be practical but it does mean you would need to re-order all your existing array items.
In the case where the "date"
is static and basically never changes once you write the array item ( i.e you never update the date for a matched array item ) then you can actually re-order sorting on that "date"
property in a single update statement, using the $sort
modifier:
collection.updateMany(
{},
{ "$push": { "field": { "$each": [], "$sort": { "date": -1 } } } }
)
Whilst it might feel "odd" to use $push
when you are not actually adding anything to the array, this is where the $sort
modifier lives. The empty array "$each": []
argument essentially means "add nothing" yet the $sort
applies to all current members of the the array.
This could optionally be done much like the earlier example with $position
, in which the $sort
would be applied on every write. However as long as the "date"
applies to the "timestamp when added" ( as I suspect it does ) then it's probably more efficient to use the "$position": 0
approach instead of sorting every time something changes. Depends on your actual implementation and how you otherwise work with the data.
Part 2: Brute force, and slow
If however for whatever reason you really don't believe that being able to "reverse" the content of the array is a practical solution, then the only other available thing is to effectively "calculate" the "last" array element by projecting this value from a supported operator.
The only practical way to do that is typically with the Aggregation Framework and specifically the $arrayElemAt
operator:
collection.aggregate([
{ "$addFields": {
"lastDate": { "$arrayElemAt": [ "$field.date", -1 ] }
}}
])
Basically that is just going to look at the supplied array content ( in this case just the "date"
property values for each element ) and then extract the value at the given index position. This operator allows the -1
index notation, meaning the "last" element in the array.
Clearly this is not ideal as the extraction is decoupled from the actual expression needed to query or filter the values. That's in the next part, but you need to realize this just iterated through your whole collection before we can even look at comparing the values to see which you want to keep.
Part 1: Fast query logic
Following on from the above the next criteria is based on the "category"
field value, with the next main issues being
By the same logic just learned you should conclude that "calculating" as you process data is "bad news" for performance. So the trick to apply here is basically including the logic in the query expression to use different supplied "date"
values depending on what the "category"
value being matched in the document is.
The most simple application of this is with an $or
expression:
var currentDateTime = new Date();
var ninetyMinsBefore = new Date(currentDateTime.valueOf() - (1000 * 60 * 90));
var oneTwentyMinsBefore = new Date(currentDateTime.valueOf() - (1000 * 60 * 120));
collection.find({
"$or": [
{
"field.0.category": 1,
"field.0.date": { "$lt": ninetyMinsBefore }
},
{
"field.0.category": 2,
"field.0.date": { "$lt": oneTwentyMinsBefore }
}
]
})
Note here that instead of calculating the "date"
which is stored adjusted by the variable interval and seeing how that compares to the current date you instead calculate the differences from the current date and then conditionally apply that depending on the value of "category"
.
This is the fast and efficient way since you were able to re-order the array items as described above and then we can apply the conditions to see if that "first" element met them.
Part 2: Slower forced calculation
collection.aggregate([
{ "$addFields": {
"lastDate": {
"$arrayElemAt": [ "$field.date", -1 ]
},
"lastCategory": {
"$arrayElemAt": [ "$field.category", -1 ]
}
}},
{ "$match": {
"$or": [
{ "lastCategory": 1, "lastDate": { "$lt": ninetyMinsBefore } },
{ "lastCategory": 2, "lastDate": { "$lt": oneTwentyMinsBefore } }
]
}}
])
Same basic premise as even though you already needed to project values from the "last" array elements there's no real need to adjust the stored "date"
values with math, which would just be complicating things further.
The original $addFields
projection is the main cost, so the main disservice here is the $match
on the bottom.
You could optionally use $expr
with modern MongoDB releases, but it's basically the same thing:
collection.find({
"$expr": {
"$or": [
{
"$and": [
{ "$eq": [ { "$arrayElemAt": [ "$field.category", -1 ] }, 1 ] },
{ "$lt": [ { "$arrayElemAt": [ "$field.date", -1 ] }, ninetyMinsBefore ] }
]
},
{
"$and": [
{ "$eq": [ { "$arrayElemAt": [ "$field.category", -1 ] }, 2 ] },
{ "$lt": [ { "$arrayElemAt": [ "$field.date", -1 ] }, oneTwentyMinsBefore ] }
]
}
]
}
})
Worth noting the special "aggregation" forms of $or
and $and
since everything within $expr
is an aggregation expression that needs to resolve to a Boolean
value of true/false
.
Either way it's all just the same problem as the initial "query only" examples are natively processed and can indeed use an index to speed up matching and results. None of these "aggregation expressions" can do that, and thus run considerably slower.
NOTE: If you are storing
"date"
with the purpose of meaning "expired" as the ones you want to select then it is "less than" the current date ( minus the interval ) rather than "greater than" as you presented in your question.This means the current time, then subtract the interval ( instead of adding to the stored time ) would be the "greater" value in the selection, and therefore things "expired" before that time.
N.B Normally when you query for array elements with documents matching multiple properties you would use the
$elemMatch
operator in order for those multiple conditions to apply to that specific array element.The only reason that does not apply here is because of the use of the numeric index value for the
0
position explicitly on each property. This means that rather than over the entire array ( like"field.date"
) this is specifically applying to only the0
position.
Upvotes: 1