Reputation: 393
I want to build an autocomplete feature using ElasticSearch and C#. But I am not getting the desired result. For demo purpose this is what I have done.
1) Created index called "names":
PUT names?pretty
2) Added 20 entries using POST command:
POST names/_doc/1
{
"name" : "John Smith"
}
3) List of Names:
[ "John Smith", "John Smitha", "John Smithb", "John Smithc", "John Smithd", "John Smithe", "John Smithf",
"John Smithg", "John Smithh", "John Smithi", "Smith John", "Smitha John", "Smithb John", "Smithc John",
"Smithd John", "Smithe John", "Smithf John", "Smithg John", "Smithh John", "Smithi John",]
4) When I run a prefix query:
GET names/_search
{
"query": {
"prefix": {
"name": {
"value": "Smith"
}
}
}
}
I expect to get back "Smith John", "Smitha John"
... But I am getting back "John Smith", "John Smitha"
...
What am I doing wrong? What do I need to change and where?
Upvotes: 2
Views: 1560
Reputation: 32376
You are defining your name
field as text
field which by default uses the standard analyzer and converts the tokens to lowercase. You can test this by using the analyze API of ES.
URL :- http://{{hostname}}:{{port}}/{{index}}/_analyze
{
"text": "John Smith",
"analyzer" : "keyword"
}
The output of above API
{
"tokens": [
{
"token": "John Smith",
"start_offset": 0,
"end_offset": 10,
"type": "word",
"position": 0
}
]
}
Notice that it's not breaking the text
and storing it as it is as explained in official ES doc.
{
"text": "Smith John",
"analyzer" : "standard"
}
The output of the above API:
{
"tokens": [
{
"token": "john",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "smith",
"start_offset": 5,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 1
}
]
}
Now when prefix query isn't analyzed and send it as it is to ES, hence Smith
notice with Capital S
would be sent to ES for token matching, now with updated mapping, only documents starting with Smith
will have that prefix and only these will come in search results.
{
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "keyword"
}
}
}
}
{
"query": {
"prefix": {
"name": {
"value": "Smith"
}
}
}
}
EDIT: :- ** Updated the setting based on the OP comments and based on above setting and search query, it gets only the results starts with Smith
as shown in below output
{
"took": 811,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 5,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "59977669",
"_type": "_doc",
"_id": "6",
"_score": 1.0,
"_source": {
"name": "Smith John"
}
},
{
"_index": "59977669",
"_type": "_doc",
"_id": "7",
"_score": 1.0,
"_source": {
"name": "Smithb John"
}
},
{
"_index": "59977669",
"_type": "_doc",
"_id": "8",
"_score": 1.0,
"_source": {
"name": "Smithc John"
}
},
{
"_index": "59977669",
"_type": "_doc",
"_id": "9",
"_score": 1.0,
"_source": {
"name": "Smithd John"
}
},
{
"_index": "59977669",
"_type": "_doc",
"_id": "10",
"_score": 1.0,
"_source": {
"name": "Smithe John"
}
}
]
}
}
Upvotes: 1
Reputation: 217254
You need to run your prefix query on the name.keyword
field and not on the name
field.
GET names/_search
{
"query": {
"prefix": {
"name.keyword": {
"value": "Smith"
}
}
}
}
The reason is that the name.keyword
field is of type keyword
and is not analyzed (i.e. one token John Smith
is indexed) and hence you can perform and exact match query on it. The name
field is of type text
and is analyzed (i.e. two tokens john
and smith
are indexed) and hence your exact match (or prefix match) query doesn't work.
You can read more about it here
Upvotes: 1