David Alsh
David Alsh

Reputation: 7649

How do I identify that an array has unique entries in MongoDB?

I have an array of strings

users: ['user1', 'user2']

If I run a search looking for exactly ['user1', 'user2'] in that order, it will find that entry. However if they are back to front, the query returns nothing.

What's the best way to compare an input array against the list in the database to determine if it is a unique entry?

Upvotes: 2

Views: 87

Answers (2)

Puneet Singh
Puneet Singh

Reputation: 3543

You can identify an unique array in a collection, by below query.

db.getCollection('mycollection').find({users:  { $size: 2, $all: [ "user1" , "user2" ]  }})

You need to mention the no. of elements in array you are checking, and check all elements in it by $all operator.

Upvotes: 1

chridam
chridam

Reputation: 103365

Using the aggregation framework with the $redact pipeline operator allows you to proccess the logical condition with the $cond operator and uses the special operations $$KEEP to "keep" the document where the logical condition is true or $$PRUNE to "remove" the document where the condition was false.

This operation is similar to having a $project pipeline that selects the fields in the collection and creates a new field that holds the result from the logical condition query and then a subsequent $match, except that $redact uses a single pipeline stage which is more efficient.

As for the logical condition, there are Set Operators that you can use since they allows expression that perform set operations on arrays, treating arrays as sets. Set expressions ignores the duplicate entries in each input array and the order of the elements, which is a suitable property in your case since you want to disregard the order of the elements.

There are a couple of these operators that you can use to perform the logical condition, namely $setIsSubset and $setDifference.


Consider the following examples which demonstrate the above concept:

Populate Test Collection

db.collection.insert([
    { users: ['user1', 'user2'] },
    { users: ['user1', 'user2', 'user2'] },
    { users: ['user1', 'user2', 'user3'] },
    { users: ['user1', 'user3'] },
])

Example 1: $redact with $setEquals

var arr = [ "user2", "user1" ];
db.collection.aggregate([
    { 
        "$redact": {
            "$cond": [
                { "$setEquals": [ "$users", arr ] },
                "$$KEEP",
                "$$PRUNE"
            ]
        }
    }
])

Sample Output

/* 1 */
{
    "_id" : ObjectId("5804902900ce8cbd028523d1"),
    "users" : [ 
        "user1", 
        "user2"
    ]
}

/* 2 */
{
    "_id" : ObjectId("5804902900ce8cbd028523d2"),
    "users" : [ 
        "user1", 
        "user2", 
        "user2"
    ]
}

Example 2: $redact with $setDifference

var arr = [ "user2", "user1" ];
db.collection.aggregate([
    { 
        "$redact": {
            "$cond": [
                { 
                    "$eq": [
                        { "$setDifference": [ "$users", arr ] },
                        []
                    ]
                },
                "$$KEEP",
                "$$PRUNE"
            ]
        }
    }
])

Sample Output

/* 1 */
{
    "_id" : ObjectId("5804902900ce8cbd028523d1"),
    "users" : [ 
        "user1", 
        "user2"
    ]
}

/* 2 */
{
    "_id" : ObjectId("5804902900ce8cbd028523d2"),
    "users" : [ 
        "user1", 
        "user2", 
        "user2"
    ]
}

Another approach, though only recommended when $redact is not available, would be to use the $where operator as:

db.collection.find({
    "$where": function() {
        var arr = ["user2", "user1"];
        return !(this.users.sort() > arr.sort() || this.users.sort() < arr.sort());
    }
})

However, bear in mind that this won't perfom very well since a query operation with the $where operator calls the JavaScript engine to evaluate Javascript code on every document and checks the condition for each.

This is very slow as MongoDB evaluates non-$where query operations before $where expressions and non-$where query statements may use an index.

It is advisable to combine with indexed queries if you can so that the query may be faster. However, it's recommended to use JavaScript expressions and the $where operator as a last resort when you can't structure the data in any other way, or when you are dealing with a small subset of data.

Upvotes: 0

Related Questions