jzacharuk
jzacharuk

Reputation: 2116

MongoDB fuzzy text search or simple sorting but always with pagination token

My question is: what is the best practice for querying a MongoDB when you need to do either a fuzzy text search and sort by search score, or do an unfiltered query that sorts by a property, but both are required to return a pagination token?

To elaborate, I have a MongoDB with documents that contain a name and description property and I have the following business requirements:

I have hacked together the following solution. Basically doing the fuzzy search, or doing a .* regex search if there has been no input from the user. This seems like it will be bad for performance once the collection grows, but it was the only way I could think to still utilize a $search in both cases so I could perform pagination without using skip/limit.

pipeline = []

if search:
    pipeline.append({
        '$search': {
            'index': 'autocomplete_index',
            'autocomplete': {
                'path': 'description',
                'query': search,
                'tokenOrder': 'any',
                'fuzzy': {
                    'maxEdits': 2,
                    'prefixLength': 1,
                    'maxExpansions': 256
                }
            },
            "sort": {'unused': {'$meta': "searchScore"}},
        }
    })
else:
    pipeline.append({
        '$search': {
            'index': 'regex_index',
            'regex': {
                'path': 'name',
                'query': '.*',
                'allowAnalyzedField': True
            },
            "sort": { 'name': 1 }
        }
    })

if pagination_token:
    for stage in pipeline:
        if '$search' in stage:
            stage['$search']['searchAfter'] = pagination_token

pipeline.append({
    "$limit": page_size
})

pipeline.append({
    "$project": {
        "_id": 1,
        "name": 1,
        "description": 1,
        "pagination_token": { "$meta" : "searchSequenceToken" }
    }
})

results = self.collection.aggregate(pipeline)

I have had to create the following indexes to support the current solution

autocomplete_index

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "description": {
        "type": "autocomplete"
      }
    }
  }
}

regex_index:

{
  "mappings": {
    "fields": {
      "name": [
        {
          "type": "token"
        },
        {
          "type": "string"
        }
      ]
    }
  }
}

Upvotes: 0

Views: 289

Answers (0)

Related Questions