joe.kovalski
joe.kovalski

Reputation: 299

MongoDB aggregate - get N random documents on document's count condition

What I am trying to achieve is to make $limit a bit smarter, as I need to limit my aggregation to evenly distributed N documents (not just the first N documents). Is it doable?

My first guess is to use $sample if count>N. Pseudocode for N = 500:

{
   $match: { some_dummy_initial_condition },
},
{ // Conditional step:
  IF $count > 500 THEN $sample: { 500 }
}

Edit: If there is no way to skip any pipeline stage then maybe there is a way to use $limit that will uniformly distribute my documents (like skipping every nth document, so I would not exceed my upper N limit)?

Upvotes: 0

Views: 346

Answers (1)

turivishal
turivishal

Reputation: 36114

There is no option to skip any pipeline stage, you can try one kind of hack, this is not accurate in memory or speed when lots of data collection,

  • $facet to create 2 separate data array, one data for all documents, second sample for sample 500 documents
  • $project to check if size of data is greater than 500 then return sample array to data otherwise return data array
  • $unwind deconstruct data array
  • $replaceWith to replace data object in root
db.collection.aggregate([
  { $match: { some_dummy_initial_condition } },
  {
    $facet: {
      data: [],
      sample: [ { $sample: { size: 500 } } ]
    }
  },
  {
    $project: {
      data: {
        $cond: [
          { $gt: [{ $size: "$data" }, 500] },
          "$sample",
          "$data"
        ]
      }
    }
  },
  { $unwind: "$data" },
  { $replaceWith: "$data" }
])

Playground

Upvotes: 1

Related Questions