Skiminock
Skiminock

Reputation: 149

Elasticsearch. Filtering\search by part of phrase

I need construct elastic request to search by part of phrase (it must be lowercase search by sequence of words).

For example, record field contains:

Lorem ipsum dolor sit amet, eam et gubergren vulputate

And I need to find this record in next ways (using next search terms):

Lorem ipsum
Lorem     ipsum dolor
lorem, ipsum.dolor
dolor sit amet

Before I used a strict search. My solution was to create a custom analyzer (Tokenizer = "keyword" and Filter = ["lowercase"]), add its to field and set field index analyzed while mapping is executing. But now the task changed.

Can anybody help me how to create request? I will be glad even any API elastic reference.

Upvotes: 0

Views: 138

Answers (1)

pickypg
pickypg

Reputation: 22332

Check out the _analyze API.

By using the noted custom analyzer (lowercase keyword), you are creating a single, large token:

$ curl -XGET 'localhost:9200/_analyze?tokenizer=keyword&filters=lowercase&text=Lorem+ipsum+dolor+sit+amet,+eam+et+gubergren+vulputate'
{
   "tokens": [
      {
         "token": "lorem ipsum dolor sit amet, eam et gubergren vulputate",
         "start_offset": 0,
         "end_offset": 54,
         "type": "word",
         "position": 1
      }
   ]
}

The only way to find that token is to search for exactly the same (post-analysis if it's being used) token.

However, if you did not use a custom analyzer at all, then you would get these tokens:

$ curl -XGET 'localhost:9200/_analyze?text=Lorem+ipsum+dolor+sit+amet,+eam+et+gubergren+vulputate'
{
   "tokens": [
      {
         "token": "lorem",
         "start_offset": 0,
         "end_offset": 5,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "ipsum",
         "start_offset": 6,
         "end_offset": 11,
         "type": "<ALPHANUM>",
         "position": 2
      },
      {
         "token": "dolor",
         "start_offset": 12,
         "end_offset": 17,
         "type": "<ALPHANUM>",
         "position": 3
      },
      {
         "token": "sit",
         "start_offset": 18,
         "end_offset": 21,
         "type": "<ALPHANUM>",
         "position": 4
      },
      {
         "token": "amet",
         "start_offset": 22,
         "end_offset": 26,
         "type": "<ALPHANUM>",
         "position": 5
      },
      {
         "token": "eam",
         "start_offset": 28,
         "end_offset": 31,
         "type": "<ALPHANUM>",
         "position": 6
      },
      {
         "token": "et",
         "start_offset": 32,
         "end_offset": 34,
         "type": "<ALPHANUM>",
         "position": 7
      },
      {
         "token": "gubergren",
         "start_offset": 35,
         "end_offset": 44,
         "type": "<ALPHANUM>",
         "position": 8
      },
      {
         "token": "vulputate",
         "start_offset": 45,
         "end_offset": 54,
         "type": "<ALPHANUM>",
         "position": 9
      }
   ]
}

Now you can search for any word in the "sentence" and find matches, including using the phrase search.

Thinking of it more simply though, you want to search with a match query to get the benefits of full text search because it will use the same analyzer on the search terms. If you use a term query (or filter), then it will only look at the exact tokens.

So, without using any custom analyzer at all, then you should be able to use those searches as-is to find the text:

$ curl -XPOST 'localhost:9200/test/type' -d '{
  "field" : "Lorem ipsum dolor sit amet, eam et gubergren vulputate"
}'

By using a plain match query:

$ curl -XGET 'localhost:9200/test/_search' -d '{
  "query" : {
    "match" : {
      "field" : "lorem, ipsum.dolor"
    }
  }
}'

Upvotes: 1

Related Questions