BarberDucky
BarberDucky

Reputation: 11

How to create an index on array elements that match a specific filter in MongoDB?

Say that there is a collection of objects, each one containing an array of elements, with each element containing fields attributeName and attributeValue. How would one create an index over attributeValue, but only of the values whose corresponding attributeName equals to a specific value?

Example collection:

  { "_id": 0, "attributes": 
    [
      {"attributeName": "name", "attributeValue": "John", ...},
      {"attributeName": "age", "attributeValue": "30", ...}
    ]
  },
  { "_id": 1, "attributes": 
    [
      {"attributeName": "name", "attributeValue": "Brian", ...},
      {"attributeName": "gender", "attributeValue": "male", ...}
    ]
  },
  { "_id": 2, "attributes": 
    [
      {"attributeName": "name", "attributeValue": "Kevin", ...},
      {"attributeName": "age", "attributeValue": "35", ...}
    ]
  }

For the given example, how would we create an index for values where "attributeName" == "age" (values 30 and 35 in this case)?

Upvotes: 1

Views: 2105

Answers (2)

barrypicker
barrypicker

Reputation: 10108

Partial indexes have been suggested by 2 users. But even partial indexes has problems with this query. If I understand your requirements, you wish to only index on documents having an attribute element of {"name": "age": 30} OR {"name": "age", "age": 35 }. Your original documents show age as a string not an integer, but I believe integer will suffice for this discussion.

Partial filter expressions do not allow an IN condition or an OR condition, but it does allow for an AND condition. Additionally we cannot create two near-identical indexes on the same fields, Mongo restricts this. For these reasons we cannot create an index on 30 or 35, but can create an index on BETWEEN 30 and 35 inclusively.

db.test.createIndex(
   { "attributes.attributeValue": 1, "attributes.attributeName": 1 },
   {
       partialFilterExpression: 
       {
           $and:
           [
               {"attributes.attributeName": "age"} , {"attributes.attributeValue": {$gte: 30} }, {"attributes.attributeValue": { $lte: 35} }
           ]
       }
   }
)

Now querying this data and utilizing the index is entirely another matter.

We can query documents in an obvious manner...

db.test.find({"attributes.attributeValue": 30, "attributes.attributeName": "age"}).pretty()

... but this will likely not result in what we want. For example, consider this document...

{ "_id": 3, "attributes": 
    [
      {"attributeName": "name", "attributeValue": "Alisa"},
      {"attributeName": "age", "attributeValue": 17},
      {"attributeName": "favoriteNumber", "attributeValue": 30}
    ]
  }

This document will be returned by the query above because as a document it has both "attributes.attributeName" containing "age" and it has "attributes.attributeValue" of 30. Different elements in the array, albeit, but still it matches the query definition. I believe we want to find only documents having an "attributes" document with both age and 30 in the same sub document. For this we need elemMatch...

db.test.find( { "attributes": { $elemMatch: { "attributeName": "age", "attributeValue": 30 } } } ).pretty()

When I use this query, I receive the expected results, but upon evaluation of an explain plan I show this is not using my index. This is performing a collection scan...

db.test.find( { "attributes": { $elemMatch: { "attributeName": "age", "attributeValue": 30 } } } ).explain("allPlansExecution")

... so what gives? As it turns out, in order to use this index, we need to have both styles of the query. We need to include each field separately, but also use an elemMatch...

db.test.find( { "attributes.attributeName": "age", "attributes.attributeValue": 30, "attributes": { $elemMatch: { "attributeName": "age", "attributeValue": 30 } } } ).pretty()

.. Now this query gives the proper results and it utilizes the indexes....

db.test.find( { "attributes.attributeName": "age", "attributes.attributeValue": 30, "attributes": { $elemMatch: { "attributeName": "age", "attributeValue": 30 } } } ).explain("allPlansExecution")

Conclusion:

Cannot have a targeted partial filter expression, best we can do is a range. If using a partial index on array elements we must include the array elements individually in conjunction with an elemMatch to utilize the index. Data types must match. If I query with "30" (as a string) it will not find the data, and will not use the index.

Side Note:

Indexing on key value pairs in an array is known as the Attribute Pattern. See https://www.mongodb.com/blog/post/building-with-patterns-the-attribute-pattern for details. The compound index is built with the value field first, then the key field second. This is intentional as the value field will likely be more selective and allow the index scan to be more efficient.

Upvotes: 2

Joe
Joe

Reputation: 28366

MongoDB doesn't support indexing in that manner.

You could use a partial index to determine which documents to index.

For each indexed document all of the elements of the array will be included in the index.

Upvotes: 1

Related Questions