Reputation: 149
I need construct elastic request to search by part of phrase (it must be lowercase search by sequence of words).
For example, record field contains:
Lorem ipsum dolor sit amet, eam et gubergren vulputate
And I need to find this record in next ways (using next search terms):
Lorem ipsum
Lorem ipsum dolor
lorem, ipsum.dolor
dolor sit amet
Before I used a strict search. My solution was to create a custom analyzer (Tokenizer = "keyword" and Filter = ["lowercase"]
), add its to field and set field index analyzed while mapping is executing. But now the task changed.
Can anybody help me how to create request? I will be glad even any API elastic reference.
Upvotes: 0
Views: 138
Reputation: 22332
Check out the _analyze
API.
By using the noted custom analyzer (lowercase
keyword
), you are creating a single, large token:
$ curl -XGET 'localhost:9200/_analyze?tokenizer=keyword&filters=lowercase&text=Lorem+ipsum+dolor+sit+amet,+eam+et+gubergren+vulputate'
{
"tokens": [
{
"token": "lorem ipsum dolor sit amet, eam et gubergren vulputate",
"start_offset": 0,
"end_offset": 54,
"type": "word",
"position": 1
}
]
}
The only way to find that token is to search for exactly the same (post-analysis if it's being used) token.
However, if you did not use a custom analyzer at all, then you would get these tokens:
$ curl -XGET 'localhost:9200/_analyze?text=Lorem+ipsum+dolor+sit+amet,+eam+et+gubergren+vulputate'
{
"tokens": [
{
"token": "lorem",
"start_offset": 0,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "ipsum",
"start_offset": 6,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "dolor",
"start_offset": 12,
"end_offset": 17,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "sit",
"start_offset": 18,
"end_offset": 21,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "amet",
"start_offset": 22,
"end_offset": 26,
"type": "<ALPHANUM>",
"position": 5
},
{
"token": "eam",
"start_offset": 28,
"end_offset": 31,
"type": "<ALPHANUM>",
"position": 6
},
{
"token": "et",
"start_offset": 32,
"end_offset": 34,
"type": "<ALPHANUM>",
"position": 7
},
{
"token": "gubergren",
"start_offset": 35,
"end_offset": 44,
"type": "<ALPHANUM>",
"position": 8
},
{
"token": "vulputate",
"start_offset": 45,
"end_offset": 54,
"type": "<ALPHANUM>",
"position": 9
}
]
}
Now you can search for any word in the "sentence" and find matches, including using the phrase search.
Thinking of it more simply though, you want to search with a match
query to get the benefits of full text search because it will use the same analyzer on the search terms. If you use a term
query (or filter), then it will only look at the exact tokens.
So, without using any custom analyzer at all, then you should be able to use those searches as-is to find the text:
$ curl -XPOST 'localhost:9200/test/type' -d '{
"field" : "Lorem ipsum dolor sit amet, eam et gubergren vulputate"
}'
By using a plain match
query:
$ curl -XGET 'localhost:9200/test/_search' -d '{
"query" : {
"match" : {
"field" : "lorem, ipsum.dolor"
}
}
}'
Upvotes: 1