Shezan Kazi
Shezan Kazi

Reputation: 4660

ElasticSearch Suggester full-text-search

I'm using django_elasticsearch_dsl.

My Document:

html_strip = analyzer(
    'html_strip',
    tokenizer='standard',
    filter=["lowercase", "stop", "snowball"],
    char_filter=["html_strip"]
)

class Document(django_elasticsearch_dsl.Document):
    name = TextField(
        analyzer=html_strip,
        fields={
            'raw': fields.KeywordField(),
            'suggest': fields.CompletionField(),
        }
    )
    ...

My request:

_search = Document.search().suggest("suggestions", text=query, completion={'field': 'name.suggest'}).execute()

I have the following document "names" indexed:

"This is a test"
"this is my test"
"this test"
"Test this"

Now if search for This is my text if will receive only

"this is my text"

However, if I search for test, then all I get is

"Test this"

Even though I want all documents, that have test in their name.

What am I missing?

Upvotes: 0

Views: 1202

Answers (2)

Bhavya
Bhavya

Reputation: 16192

Based on the comment given by the user, adding another answer using ngrams

Adding a working example with index mapping, index data, search query, and search result

Index Mapping:

{
  "settings": {
    "analysis": {
      "filter": {
        "ngram_filter": {
          "type": "ngram",
          "min_gram": 4,
          "max_gram": 20
        }
      },
      "analyzer": {
        "ngram_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "ngram_filter"
          ]
        }
      }
    },
    "max_ngram_diff": 50
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "ngram_analyzer",
        "search_analyzer": "standard"
      }
    }
  }
}

Index Data:

{
  "name": [
    "Test this"
  ]
}

{
  "name": [
    "This is a test"
  ]
}

{
  "name": [
    "this is my test"
  ]
}

{
  "name": [
    "this test"
  ]
}

Analyze API:

POST/_analyze

{
  "analyzer" : "ngram_analyzer",
  "text" : "this is my test"
}

The following tokens are generated:

{
  "tokens": [
    {
      "token": "this",
      "start_offset": 0,
      "end_offset": 4,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "test",
      "start_offset": 11,
      "end_offset": 15,
      "type": "<ALPHANUM>",
      "position": 3
    }
  ]
}

Search Query:

{
    "query": {
        "match": {
           "name": "test"
        }
    }
}

Search Result:

"hits": [
      {
        "_index": "stof_64281341",
        "_type": "_doc",
        "_id": "4",
        "_score": 0.2876821,
        "_source": {
          "name": [
            "Test this"
          ]
        }
      },
      {
        "_index": "stof_64281341",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "name": [
            "this is my test"
          ]
        }
      },
      {
        "_index": "stof_64281341",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "name": [
            "This is a test"
          ]
        }
      },
      {
        "_index": "stof_64281341",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "name": [
            "this test"
          ]
        }
      }
    ]

For fuzzy search you can use the below search query:

{
  "query": {
    "fuzzy": {
      "name": {
        "value": "tst"    <-- used tst in place of test
      }
    }
  }
}

Upvotes: 1

Bhavya
Bhavya

Reputation: 16192

The best way to the completion suggester that can match the middle of fields is n-gram filter.

You can use multiple suggestions, where one suggestion is based on the prefix and for matching in the middle of fields you can use regex.

I am not aware of django_elasticsearch_dsl, adding a working example with index mapping, data, search query, and search result

Index Mapping:

{
  "mappings": {
    "properties": {
      "name": {
        "type": "completion"
      }
    }
  }
}

Index Data:

{
  "name": {
    "input": ["Test this"]
  }
}
{
  "name": {
    "input": ["this is my test"]
  }
}
{
  "name": {
    "input": ["This is a test"]
  }
}
{
  "name": {
    "input": ["this test"]
  }
}

Search Query:

    {
        "suggest": {
            "suggest-exact": {
                "prefix": "test",
                "completion": {
                    "field": "name",
                    "skip_duplicates": true
                }
            },
            "suggest-regex": {
                "regex": ".*test.*",
                "completion": {
                    "field": "name",
                    "skip_duplicates": true
                }
            }
        }
    }

Search Result:

"suggest": {
    "suggest-exact": [
      {
        "text": "test",
        "offset": 0,
        "length": 4,
        "options": [
          {
            "text": "Test this",
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "4",
            "_score": 1.0,
            "_source": {
              "name": {
                "input": [
                  "Test this"
                ]
              }
            }
          }
        ]
      }
    ],
    "suggest-regex": [
      {
        "text": ".*test.*",
        "offset": 0,
        "length": 8,
        "options": [
          {
            "text": "Test this",
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "4",
            "_score": 1.0,
            "_source": {
              "name": {
                "input": [
                  "Test this"
                ]
              }
            }
          },
          {
            "text": "This is a test",
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "1",
            "_score": 1.0,
            "_source": {
              "name": {
                "input": [
                  "This is a test"
                ]
              }
            }
          },
          {
            "text": "this is my test",
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "2",
            "_score": 1.0,
            "_source": {
              "name": {
                "input": [
                  "this is my test"
                ]
              }
            }
          },
          {
            "text": "this test",
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "3",
            "_score": 1.0,
            "_source": {
              "name": {
                "input": [
                  "this test"
                ]
              }
            }
          }
        ]
      }

Upvotes: 1

Related Questions