gabbar0x
gabbar0x

Reputation: 4266

Retrieve Documents where sum of sizes of arrays is greater than given value

I have a Mongoose Schema defined as such:

const hackathonSchema = new mongoose.Schema({
    hackathonId: {type: Number, required: true},
    uuid: {type: String, required: true},
    data: {type: Object, required: true},
    isPublished: {type: Boolean, default: false},
    organisers: [String],
    volunteers: [String],
    participants: [String],
    mentors: [String]
});

export default mongoose.model('Hackathon', hackathonSchema);

I want to retrieve all the Hackathons where the length of:

( organisers + volunteers + participants +mentors ) >= 500

or any value for that matter.

I found an answer of SO that does this but not in Mongoose How to select where sum of fields is greater than a value in MongoDB

Upvotes: 2

Views: 737

Answers (1)

Neil Lunn
Neil Lunn

Reputation: 151132

Simply add the sizes together:

With MongoDB 3.4 or greater using $concatArrays

Model.aggregate([
  { "$redact": {
    "$cond": {
      "if": {
        "$gt": [
          { "$size": {
            "$concatArrays": [
              { "$ifNull": [ "$organisers", [] ]  },
              { "$ifNull": [ "$volunteers", [] ]  },
              { "$ifNull"; [ "$participants", [] ] },
              { "$ifNull": [ "$mentors", [] ] }
            ]
          } },
          500
        ]    
      },
      "then": "$$KEEP",
      "else": "$$PRUNE"
    }
  }},
  { "$project": { "_id": 1 } }
],function(err,results) { 

})

Or in earlier versions without that operator

Model.aggregate([
  { "$redact": {
    "$cond": {
      "if": {
        "$gt": [
          { "$add": [
            { "$size": { "$ifNull": [ "$organisers", [] ] } },
            { "$size": { "$ifNull": [ "$volunteers", [] ] } },
            { "$size": { "$ifNull": [ "$participants", [] ] } },
            { "$size": { "$ifNull": [ "$mentors", [] ] } }
          ]},
          500
        ]    
      },
      "then": "$$KEEP",
      "else": "$$PRUNE"
    }
  }},
  { "$project": { "_id": 1 } }
],function(err,results) { 

})

In either approach you are using $redact as a logical filter for the documents in the collection. As a native operator, this is the fastest way you can process this condition.

Internally it's only argument is $cond which is a "ternary" operation ( if/then/else ) to evaluate and return a value. So when the result of the condition to "if" results in true, "then" the action is to $$KEEP the document, or alternately "else" to $$PRUNE the document from the results.

The differing approaches based on versions are either:

As for just returning the _id field only, then it is a simple matter of adding a $project stage, where just like in regular query projection, you provide the list of properties to return. In this case, only the _id field.

You could add some presumptions as to the minimum array length to the base query with a $match first, but this would be a presumption, rather than absolute fact.


For the record, you can run exactly the same thing using the $where clause, but since this operator uses JavaScript evaluation rather than being natively implemented as the aggregation framework operations are, then it does make a significant performance impact in that it runs slower:

Model.find({ "$where": function() {
   return [
     ...this.organisers,
     ...this.volunteers,
     ...this.participants,
     ...this.mentors
   ].length > 500
}).select({ "_id": 1 }).exec(function(err,results) {
})

So whilst it may "look pretty" compared to the DSL form of an aggregation pipeline structure, the performance penalty is not really worth it. You should only do this if your MongoDB version lacks $redact as an operator, which would be prior to MongoDB 2.6. And in that case, you should probably be updating MongoDB for other reasons as well.

Upvotes: 1

Related Questions