Reputation: 1533
I'm trying to write a query to will give me all the documents where the field "id" is of the form: "SOMETHING-SOMETHING-4SOMETHING-SOMETHING-SOMETHING"
For instance, ab-ba-4a-b-a is a valid id.
I wrote this query
"query":
{
"regexp":
{
"id":
{
"value": ".*-.*-4.*-.*-.*"
}
}
}
It gets no hits. What's wrong with this? I can see many ids of this form.
Upvotes: 0
Views: 28
Reputation: 8860
If the id
field is of type keyword
the regexp should be working fine.
However if it is of type text
, notice how elasticsearch stores the token internally.
POST /_analyze
{
"text": "abc-abc-4bc-abc-abc",
"analyzer": "standard"
}
{
"tokens" : [
{
"token" : "abc",
"start_offset" : 0,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "abc",
"start_offset" : 4,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "4bc",
"start_offset" : 8,
"end_offset" : 11,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "abc",
"start_offset" : 12,
"end_offset" : 15,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "abc",
"start_offset" : 16,
"end_offset" : 19,
"type" : "<ALPHANUM>",
"position" : 4
}
]
}
Notice that it breaks down the token abc-abc-4abc-abc-abc
into 5 strings. Take a look at what Analysis and Analyzers are and how they are only applied on text
fields.
However, keyword datatype has been created only for the cases where you do not want your text to be analyzed (i.e. broken into tokens and stored in inverted indexes) and stores the string value as it is internally.
Now just in case if your mapping is dynamic, ES by default creates two different fields for string values. a text and its keyword sibling, something like below:
{
"mappings" : {
"properties" : {
"id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
In that case, just apply the query you have on id.keyword
field.
POST <your_index_name>/_search
{
"query": {
"regexp": {
"id.keyword": ".*-.*-4.*-.*-.*"
}
}
}
Hope that helps!
Upvotes: 1