Reputation: 586
The match phrase query
{
"query": {
"match_phrase": {
"approved_labelled_products.companies": "SOMETHING INC"
}
}
returns a particular result but the match_phrase_prefix query
{
"query": {
"match_phrase_prefix": {
"approved_labelled_products.companies": "SOME.*"
}
}
}
return an empty result set
"hits":
{
"total": 0,
"max_score": null,
"hits": []
}
The match_phrase_prefix must atleast return the data that has been obtained by the match_phrase query but it doesnt.
the mapping for the data is as follows
"approved_labelled_products": {
"properties": {
"companies": {
"type": "keyword",
"null_value": "NULL",
"ignore_above": 9500
}
}
}
Upvotes: 3
Views: 4038
Reputation: 6066
match_phrase
and match_phrase_prefix
queries are full-text search queries and require the data field to be of text
type. It is very much different from the keyword
type you are using, now let me explain what you can do now and what is the difference.
match_phrase_prefix
work?Yes, you can use match_phrase_prefix
if you change the type of the field to text
.
keyword
field?keyword
is stored and queried as-is, without any analysis. Think about it as a single string; to find all documents that have such field with given prefix it is enough to use a prefix
query.
Let's define our mapping and insert a couple of documents:
PUT myindex
{
"mappings": {
"_doc": {
"properties": {
"approved_labelled_products": {
"properties": {
"companies": {
"type": "keyword",
"null_value": "NULL",
"ignore_above": 9500
}
}
}
}
}
}
}
POST myindex/_doc
{
"approved_labelled_products": {
"companies": "SOMETHING INC"
}
}
Now we can issue a query like this:
POST myindex/_doc/_search
{
"query": {
"prefix": {
"approved_labelled_products.companies": "SOME"
}
}
}
Note that, since there is literally no analysis performed, the request is case-sensitive, and querying by string "some"
will not return results.
text
field different?text
field is analyzed during indexing time, which means the input string is split into tokens, lowercased, some meta-information is saved and an inverted index is constructed.
This allows to fetch documents containing certain token or combination of tokens efficiently.
To illustrate this we can use _analyze API. Let's try to see how Elasticsearch would analyze the data for keyword
field first:
POST _analyze
{
"analyzer" : "keyword",
"text": "SOMETHING INC"
}
This will return:
{
"tokens": [
{
"token": "SOMETHING INC",
"start_offset": 0,
"end_offset": 13,
"type": "word",
"position": 0
}
]
}
As you can see, it is a single token with all capital letters.
Now let's see what standard
analyzer does (the one that text
field uses by default):
POST _analyze
{
"analyzer" : "standard",
"text": "SOMETHING INC"
}
It will return:
{
"tokens": [
{
"token": "something",
"start_offset": 0,
"end_offset": 9,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "inc",
"start_offset": 10,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 1
}
]
}
As you can see, it has produced two tokens, both lowercased.
Hope that helps!
Upvotes: 4
Reputation: 7864
You don't have to use wildcard expression in match_phrase_prefix
query.
Use this instead:
{
"query": {
"match_phrase_prefix": {
"approved_labelled_products.companies": "SOME"
}
}
}
Upvotes: 0