Stewart
Stewart

Reputation: 18313

Filtering in Atlas Vector Search

I'm following this tutorial on Spring Boot AI with MongoDB

The only difference is that I'm using the very latest version of Spring AI, which is 1.0.0-M6 which has some slight syntax differences to the version used in the tutorial.

Everything works right up to the last section, where we are filtering the returned result documents. By debugging into the Spring code, I have concluded that the failing query is

{
  "aggregate": "__collection__",
  "pipeline": [
    {
      "$vectorSearch": {
        "queryVector": [
          ..... 1536 floating point values .....
        ],
        "path": "embedding",
        "numCandidates": 200,
        "index": "vector_index",
        "limit": 2,
        "filter": {
          "metadata.author": {
            "$eq": "A"
          }
        }
      }
    },
    {
      "$addFields": {
        "score": {
          "$meta": "vectorSearchScore"
        }
      }
    },
    {
      "$match": {
        "score": {
          "$gte": 0.5
        }
      }
    }
  ]
}

The exception being thrown is below. The part I paid attention to was the phrase Path 'metadata.author' needs to be indexed as token"

com.mongodb.MongoCommandException: Command failed with error 8 (UnknownError): 'PlanExecutor error during aggregation :: caused by :: Path 'metadata.author' needs to be indexed as token' on server shard-00-02.2arll.mongodb.net:27017. The full response is {"ok": 0.0, "errmsg": "PlanExecutor error during aggregation :: caused by :: Path 'metadata.author' needs to be indexed as token", "code": 8, "codeName": "UnknownError", "$clusterTime": {"clusterTime": {"$timestamp": {"t": 1740598093, "i": 50}}, "signature": {"hash": {"$binary": {"base64": "2iW7YMGxdKhcr/ArTYifjGdZWGg=", "subType": "00"}}, "keyId": 7420727649443512328}}, "operationTime": {"$timestamp": {"t": 1740598093, "i": 50}}}

So I added an new index for that:

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "metadata.author": {
        "normalizer": "lowercase",
        "type": "token"
      }
    }
  }
}

But the same error occurrs.

Upvotes: 0

Views: 28

Answers (1)

Stewart
Stewart

Reputation: 18313

Also the tutorial says

You must add the path for your metadata field to your Atlas Vector Search index. See the About the filter Type section of the How to Index Fields for Vector Search tutorial to learn more.

And it then links to this page which says

You can optionally index boolean, date, number, objectId, string, and UUID fields to pre-filter your data. Filtering your data is useful to narrow the scope of your semantic search and ensure that not all vectors are considered for comparison. It reduces the number of documents against which to run similarity comparisons, which can decrease query latency and increase the accuracy of search results.

You must index the fields that you want to filter by using the filter type inside the fields array.

It then gives the example

{
  "fields":[
    {
      "type": "vector",
      ...
    },
    {
      "type": "filter",
      "path": "<field-to-index>"
    },
    ...
  ]
}

The important thing to understand is that the filter is a pre-filter. It filters before the vector search.

So the fix was to remove the token filter, and combine both into the index config for vector search this way:

{
  "fields": [
    {
      "numDimensions": 1536,
      "path": "embedding",
      "similarity": "cosine",
      "type": "vector"
    },
    {
      "path": "metadata.author",
      "type": "filter"
    }
  ]
}

This works now.

Upvotes: 0

Related Questions