anubysh
anubysh

Reputation: 586

match_phrase_prefix doesnt return result obtained by match_phrase

The match phrase query

{ 
  "query": {
    "match_phrase": {
      "approved_labelled_products.companies": "SOMETHING INC"
    }
  }

returns a particular result but the match_phrase_prefix query

{
  "query": {
    "match_phrase_prefix": {
      "approved_labelled_products.companies": "SOME.*"
    }
  }
}

return an empty result set

"hits": 
{
    "total": 0,
    "max_score": null,
    "hits": []
 }

The match_phrase_prefix must atleast return the data that has been obtained by the match_phrase query but it doesnt.

the mapping for the data is as follows

    "approved_labelled_products": {
            "properties": {
              "companies": {
                "type": "keyword",
                "null_value": "NULL",
                "ignore_above": 9500
              }
             }
            }

Upvotes: 3

Views: 4038

Answers (2)

Nikolay Vasiliev
Nikolay Vasiliev

Reputation: 6066

match_phrase and match_phrase_prefix queries are full-text search queries and require the data field to be of text type. It is very much different from the keyword type you are using, now let me explain what you can do now and what is the difference.

Can I make match_phrase_prefix work?

Yes, you can use match_phrase_prefix if you change the type of the field to text.

How can I search for a prefix using keyword field?

keyword is stored and queried as-is, without any analysis. Think about it as a single string; to find all documents that have such field with given prefix it is enough to use a prefix query.

Let's define our mapping and insert a couple of documents:

PUT myindex
{
  "mappings": {
    "_doc": {
      "properties": {
        "approved_labelled_products": {
          "properties": {
            "companies": {
              "type": "keyword",
              "null_value": "NULL",
              "ignore_above": 9500
            }
          }
        }
      }
    }
  }
}

POST myindex/_doc
{
  "approved_labelled_products": {
    "companies": "SOMETHING INC"
  }
}

Now we can issue a query like this:

POST myindex/_doc/_search
{
  "query": {
    "prefix": {
      "approved_labelled_products.companies": "SOME"
    }
  }
}

Note that, since there is literally no analysis performed, the request is case-sensitive, and querying by string "some" will not return results.

How is text field different?

text field is analyzed during indexing time, which means the input string is split into tokens, lowercased, some meta-information is saved and an inverted index is constructed.

This allows to fetch documents containing certain token or combination of tokens efficiently.

To illustrate this we can use _analyze API. Let's try to see how Elasticsearch would analyze the data for keyword field first:

POST _analyze
{
  "analyzer" : "keyword",
  "text": "SOMETHING INC"
}

This will return:

{
  "tokens": [
    {
      "token": "SOMETHING INC",
      "start_offset": 0,
      "end_offset": 13,
      "type": "word",
      "position": 0
    }
  ]
}

As you can see, it is a single token with all capital letters.

Now let's see what standard analyzer does (the one that text field uses by default):

POST _analyze
{
  "analyzer" : "standard",
  "text": "SOMETHING INC"
}

It will return:

{
  "tokens": [
    {
      "token": "something",
      "start_offset": 0,
      "end_offset": 9,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "inc",
      "start_offset": 10,
      "end_offset": 13,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

As you can see, it has produced two tokens, both lowercased.


Hope that helps!

Upvotes: 4

Nishant
Nishant

Reputation: 7864

You don't have to use wildcard expression in match_phrase_prefix query.

Use this instead:

{
  "query": {
    "match_phrase_prefix": {
      "approved_labelled_products.companies": "SOME"
    }
  }
}

Upvotes: 0

Related Questions