Reputation: 4660
I'm using django_elasticsearch_dsl.
My Document:
html_strip = analyzer(
'html_strip',
tokenizer='standard',
filter=["lowercase", "stop", "snowball"],
char_filter=["html_strip"]
)
class Document(django_elasticsearch_dsl.Document):
name = TextField(
analyzer=html_strip,
fields={
'raw': fields.KeywordField(),
'suggest': fields.CompletionField(),
}
)
...
My request:
_search = Document.search().suggest("suggestions", text=query, completion={'field': 'name.suggest'}).execute()
I have the following document "names" indexed:
"This is a test"
"this is my test"
"this test"
"Test this"
Now if search for This is my text
if will receive only
"this is my text"
However, if I search for test
, then all I get is
"Test this"
Even though I want all documents, that have test
in their name.
What am I missing?
Upvotes: 0
Views: 1202
Reputation: 16192
Based on the comment given by the user, adding another answer using ngrams
Adding a working example with index mapping, index data, search query, and search result
Index Mapping:
{
"settings": {
"analysis": {
"filter": {
"ngram_filter": {
"type": "ngram",
"min_gram": 4,
"max_gram": 20
}
},
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"ngram_filter"
]
}
}
},
"max_ngram_diff": 50
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "ngram_analyzer",
"search_analyzer": "standard"
}
}
}
}
Index Data:
{
"name": [
"Test this"
]
}
{
"name": [
"This is a test"
]
}
{
"name": [
"this is my test"
]
}
{
"name": [
"this test"
]
}
Analyze API:
POST/_analyze
{
"analyzer" : "ngram_analyzer",
"text" : "this is my test"
}
The following tokens are generated:
{
"tokens": [
{
"token": "this",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "test",
"start_offset": 11,
"end_offset": 15,
"type": "<ALPHANUM>",
"position": 3
}
]
}
Search Query:
{
"query": {
"match": {
"name": "test"
}
}
}
Search Result:
"hits": [
{
"_index": "stof_64281341",
"_type": "_doc",
"_id": "4",
"_score": 0.2876821,
"_source": {
"name": [
"Test this"
]
}
},
{
"_index": "stof_64281341",
"_type": "_doc",
"_id": "3",
"_score": 0.2876821,
"_source": {
"name": [
"this is my test"
]
}
},
{
"_index": "stof_64281341",
"_type": "_doc",
"_id": "2",
"_score": 0.2876821,
"_source": {
"name": [
"This is a test"
]
}
},
{
"_index": "stof_64281341",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"name": [
"this test"
]
}
}
]
For fuzzy search you can use the below search query:
{
"query": {
"fuzzy": {
"name": {
"value": "tst" <-- used tst in place of test
}
}
}
}
Upvotes: 1
Reputation: 16192
The best way to the completion suggester that can match the middle of fields is n-gram filter.
You can use multiple suggestions, where one suggestion is based on the prefix and for matching in the middle of fields you can use regex.
I am not aware of django_elasticsearch_dsl, adding a working example with index mapping, data, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"name": {
"type": "completion"
}
}
}
}
Index Data:
{
"name": {
"input": ["Test this"]
}
}
{
"name": {
"input": ["this is my test"]
}
}
{
"name": {
"input": ["This is a test"]
}
}
{
"name": {
"input": ["this test"]
}
}
Search Query:
{
"suggest": {
"suggest-exact": {
"prefix": "test",
"completion": {
"field": "name",
"skip_duplicates": true
}
},
"suggest-regex": {
"regex": ".*test.*",
"completion": {
"field": "name",
"skip_duplicates": true
}
}
}
}
Search Result:
"suggest": {
"suggest-exact": [
{
"text": "test",
"offset": 0,
"length": 4,
"options": [
{
"text": "Test this",
"_index": "stof_64281341",
"_type": "_doc",
"_id": "4",
"_score": 1.0,
"_source": {
"name": {
"input": [
"Test this"
]
}
}
}
]
}
],
"suggest-regex": [
{
"text": ".*test.*",
"offset": 0,
"length": 8,
"options": [
{
"text": "Test this",
"_index": "stof_64281341",
"_type": "_doc",
"_id": "4",
"_score": 1.0,
"_source": {
"name": {
"input": [
"Test this"
]
}
}
},
{
"text": "This is a test",
"_index": "stof_64281341",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"name": {
"input": [
"This is a test"
]
}
}
},
{
"text": "this is my test",
"_index": "stof_64281341",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"name": {
"input": [
"this is my test"
]
}
}
},
{
"text": "this test",
"_index": "stof_64281341",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"name": {
"input": [
"this test"
]
}
}
}
]
}
Upvotes: 1