Reputation: 299
What I am trying to achieve is to make $limit
a bit smarter, as I need to limit my aggregation to evenly distributed N documents (not just the first N documents). Is it doable?
My first guess is to use $sample
if count>N. Pseudocode for N = 500:
{
$match: { some_dummy_initial_condition },
},
{ // Conditional step:
IF $count > 500 THEN $sample: { 500 }
}
Edit: If there is no way to skip any pipeline stage then maybe there is a way to use $limit
that will uniformly distribute my documents (like skipping every nth document, so I would not exceed my upper N limit)?
Upvotes: 0
Views: 346
Reputation: 36114
There is no option to skip any pipeline stage, you can try one kind of hack, this is not accurate in memory or speed when lots of data collection,
$facet
to create 2 separate data array, one data
for all documents, second sample
for sample 500 documents$project
to check if size of data is greater than 500 then return sample array to data otherwise return data array$unwind
deconstruct data array$replaceWith
to replace data object in rootdb.collection.aggregate([
{ $match: { some_dummy_initial_condition } },
{
$facet: {
data: [],
sample: [ { $sample: { size: 500 } } ]
}
},
{
$project: {
data: {
$cond: [
{ $gt: [{ $size: "$data" }, 500] },
"$sample",
"$data"
]
}
}
},
{ $unwind: "$data" },
{ $replaceWith: "$data" }
])
Upvotes: 1