Reputation: 3551
This is probably a longshot, but:
I'd like to group a set of time-series documents by gaps between dates: sort the documents ascending by date, then partition when the interval between the current and previous is above some threshold.
I can do this easily after getting the documents, of course; in this example, the original documents get a new partition number field:
// assuming sorted docs
var partition = 0;
var partitioned = docs.map((e,i) => {
if(i > 0)
if(e.date - docs[i-1].date > minInterval) partition++;
return {
date: e.date,
partition: partition
}
});
But I don't actually need the documents themselves, I just need the first and last dates and number of docs for each partition. It's just unclear how I would do the partitioning function.
Is this possible with an aggregation? I see a possibly relevant Mondo ticket that is open, so I'm guessing not.
Upvotes: 1
Views: 825
Reputation: 49985
Yes, it is possible. To compare multiple documents you need to put them in one array using $group and passing null
as _id
. Then to start comparing values you need an index just like in for loop so you can generate it using $range operator.
To determine partitions you need double $map. First one will return an array of 0
and 1
values where 1
means that this date starts new partition.
Second $map
is to merge dates with partition indexes. To get the partition index you can $sum an subarray ($slice) of zeros and ones.
For instance:
db.col.save({ date: ISODate("2019-04-12T21:00:00.000Z") })
db.col.save({ date: ISODate("2019-04-12T21:15:00.000Z") })
db.col.save({ date: ISODate("2019-04-12T21:45:00.000Z") })
db.col.save({ date: ISODate("2019-04-12T23:00:00.000Z") })
db.col.save({ date: ISODate("2019-04-12T20:00:00.000Z") })
db.col.save({ date: ISODate("2019-04-12T18:30:00.000Z") })
db.col.save({ date: ISODate("2019-04-12T20:10:00.000Z") })
For the interval of 20
minutes you can run below aggregation:
db.col.aggregate([
{ $sort: { date: 1 } },
{ $group: { _id: null, dates: { $push: "$date" } } },
{
$addFields: {
partitions: {
$map: {
input: { $range: [ 0, { $size: "$dates" } ] },
as: "index",
in: {
$let: {
vars: {
current: { $arrayElemAt: [ "$dates", "$$index" ] },
prev: { $arrayElemAt: [ "$dates", { $add: [ "$$index", -1 ] } ] }
},
in: {
$cond: [
{ $or: [ { $eq: [ "$$index", 0 ] }, { $lt: [ { $subtract: [ "$$current", "$$prev" ] }, 1200000 ] } ] },
0,
1
]
}
}
}
}
}
}
},
{
$project: {
datesWithPartitions: {
$map: {
input: { $range: [ 0, { $size: "$dates" } ] },
as: "index",
in: {
date: { $arrayElemAt: [ "$dates", "$$index" ] },
partition: { $sum: { $slice: [ "$partitions", { $add: [ "$$index", 1 ] } ] } }
}
}
}
}
}
])
which wlll print:
{
"_id" : null,
"datesWithPartitions" : [
{
"date" : ISODate("2019-04-12T18:30:00Z"),
"partition" : 0
},
{
"date" : ISODate("2019-04-12T20:00:00Z"),
"partition" : 1
},
{
"date" : ISODate("2019-04-12T20:10:00Z"),
"partition" : 1
},
{
"date" : ISODate("2019-04-12T21:00:00Z"),
"partition" : 2
},
{
"date" : ISODate("2019-04-12T21:15:00Z"),
"partition" : 2
},
{
"date" : ISODate("2019-04-12T21:45:00Z"),
"partition" : 3
},
{
"date" : ISODate("2019-04-12T23:00:00Z"),
"partition" : 4
}
]
}
Upvotes: 1