BadPiggie
BadPiggie

Reputation: 6384

Some documents not appear in atlas-search when query by few letters

I have a collection. The document structure is,

{
  model: {
    name: 'string name'
  }
}

I have enabled atlas search, Also created a search index for model.name field. Search works fine, But the only issue is couldn't get results for very minimal query letters.

Example:

I have a document,

{
  model: {
     name: "space1duplicate"
  }
}

If I query space, I couldn't get the result.

{
  index: 'search_index',
  compound: {
    must: [
      {
        text: {
          query: 'space',
          path: 'model.name'
        }
      }
    ]
  }
}

But If I query space1duplica, It returns the result.

Upvotes: 1

Views: 1484

Answers (2)

qwerty
qwerty

Reputation: 198

During indexing, full text search engine tokenizes the input by splitting up text into searchable chunks. Check out the relevant section in the documentation.

By default Atlas Search does not split words by digits, but if you need that, try to define a custom analyzer with the regex tokenizer and use it for your field:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "name": [
        {
          "analyzer": "digitSplitter",
          "type": "string"
        }
      ]
    }
  },
  "analyzers": [
    {
      "charFilters": [],
      "name": "digitSplitter",
      "tokenFilters": [],
      "tokenizer": {
        "pattern": "[0-9]+",
        "type": "regexSplit"
      }
    }
  ]
}

Also note that you can use multiple analyzers for string fields, if needed.

Upvotes: 3

Alex Blex
Alex Blex

Reputation: 37018

Atlas search uses Lucene to do the job. Documentation on mongodb site is mostly focused on mongo specific syntax to pass the query to Lucene and might be a bit confusing if you are not familiar with its query language.

First of all, there are number of tokenizers and analizers available, each serve specific purpose. You really need include index definition when you ask quetions about atlas search.

Default tokeniser uses word separators to build the index, then removes endings to store stems, again depending on language, English by default.

So in order to find "space1duplicate" by beginning of the word you can use "autocomplete" analizer with nGram tokens. The index should be created as following:

{
  "mappings": {
    "dynamic": false,
    "fields": {
      "name": {
        "tokenization": "nGram",
        "type": "autocomplete"
      }
    }
  },
  "storedSource": {
    "include": [
      "name"
    ]
  }
}

Once it's indexed (you may need to wait a bit you you have larger dataset), you can find the document with following search:

{
  index: 'search_index',
  compound: {
    must: [
      {
        autocomplete: {
          query: 'spa',
          path: 'name'
        }
      }
    ]
  }
}

Upvotes: 3

Related Questions