Reputation: 324
I'm sending this request
curl -XGET 'host/process_test_3/14/_search' -d '{
"query" : {
"query_string" : {
"query" : "\"*cor interface*\"",
"fields" : ["title", "obj_id"]
}
}
}'
And I'm getting correct result
{
"took": 12,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 5.421598,
"hits": [
{
"_index": "process_test_3",
"_type": "14",
"_id": "141_dashboard_14",
"_score": 5.421598,
"_source": {
"obj_type": "dashboard",
"obj_id": "141",
"title": "Cor Interface Monitoring"
}
}
]
}
}
But when I want to search by word part, as example
curl -XGET 'host/process_test_3/14/_search' -d '
{
"query" : {
"query_string" : {
"query" : "\"*cor inter*\"",
"fields" : ["title", "obj_id"]
}
}
}'
I'm getting no results back:
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : []
}
}
What am I doing wrong?
Upvotes: 2
Views: 2163
Reputation: 919
+1 to Val's solution.
Just wanted to add something.
Since your query is relatively simple, you may want to have a look at match
/match_phrase
queries. Match queries does have the regex parsing like query_string and are thus lighter.
You can find the details here: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html
Upvotes: 1
Reputation: 217334
This is because your title
field has probably been analyzed by the standard analyzer (default setting) and the title Cor Interface Monitoring
has been tokenized as the three tokens cor
, interface
and monitoring
.
In order to search any substring of words, you need to create a custom analyzer which leverages the ngram token filter in order to also index all substrings of each of your tokens.
You can create your index like this:
curl -XPUT localhost:9200/process_test_3 -d '{
"settings": {
"analysis": {
"analyzer": {
"substring_analyzer": {
"tokenizer": "standard",
"filter": ["lowercase", "substring"]
}
},
"filter": {
"substring": {
"type": "nGram",
"min_gram": 2,
"max_gram": 15
}
}
}
},
"mappings": {
"14": {
"properties": {
"title": {
"type": "string",
"analyzer": "substring_analyzer"
}
}
}
}
}'
Then you can reindex your data. What this will do is that the title Cor Interface Monitoring
will now be tokenized as:
co
, cor
, or
in
, int
, inte
, inter
, interf
, etcmo
, mon
, moni
, etcso that your second search query will now return the document you expect because the tokens cor
and inter
will now match.
Upvotes: 6