Reputation: 464
I have a field named log.file.path
in Elasticsearch and it has /var/log/dev-collateral/uaa.2020-09-26.log
value, I tried to retrieve all logs that log.file.path
field starts with /var/log/dev-collateral/uaa
I used the below regexp but it doesn't work.
{
"regexp":{
"log.file.path": "/var/log/dev-collateral/uaa.*"
}
}
Upvotes: 1
Views: 348
Reputation: 464
I used this query and it works!
{
"query": {
"regexp": {
"log.file.path.keyword": {
"value": "/var/log/dev-collateral/uaa.*",
"flags": "ALL",
"max_determinized_states": 10000,
"rewrite": "constant_score"
}
}
}
}
Upvotes: 0
Reputation: 38502
Let's see why it is not working? I've indexed two documents using Kibana UI like below -
PUT myindex/_doc/1
{
"log.file.path" : "/var/log/dev-collateral/uaa.2020-09-26.log"
}
PUT myindex/_doc/2
{
"log.file.path" : "/var/log/dev-collateral/uaa.2020-09-26.txt"
}
When I try to see the tokens for of the text on log.file.path
field using _analyze
API
POST _analyze
{
"text": "/var/log/dev-collateral/uaa.2020-09-26.log"
}
It gives me,
{
"tokens" : [
{
"token" : "var",
"start_offset" : 1,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "log",
"start_offset" : 5,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "dev",
"start_offset" : 9,
"end_offset" : 12,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "collateral",
"start_offset" : 13,
"end_offset" : 23,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "uaa",
"start_offset" : 24,
"end_offset" : 27,
"type" : "<ALPHANUM>",
"position" : 4
},
{
"token" : "2020",
"start_offset" : 28,
"end_offset" : 32,
"type" : "<NUM>",
"position" : 5
},
{
"token" : "09",
"start_offset" : 33,
"end_offset" : 35,
"type" : "<NUM>",
"position" : 6
},
{
"token" : "26",
"start_offset" : 36,
"end_offset" : 38,
"type" : "<NUM>",
"position" : 7
},
{
"token" : "log",
"start_offset" : 39,
"end_offset" : 42,
"type" : "<ALPHANUM>",
"position" : 8
}
]
}
You can see, Elasticsearch has split your input text into tokens when you insert them on your index. This is because elasticsearch uses standard analyzer when we index documents and it splits our document to small parts as a token, remove punctuations, lowercased text etc. That's whey your current regexp query doesn't work.
GET myindex/_search
{
"query": {
"match": {
"log.file.path": "var"
}
}
}
If you try this way it will work but for your case, you need to match every log.file.path that ends with .log So what do now? Just don't apply analyzers while indexing documents. The keyword type stores the string you provide as it is.
Create mapping with keyword
type,
PUT myindex2/
{
"mappings": {
"properties": {
"log.file.path": {
"type": "keyword"
}
}
}
}
Index documents,
PUT myindex2/_doc/1
{
"log.file.path" : "/var/log/dev-collateral/uaa.2020-09-26.log"
}
PUT myindex2/_doc/2
{
"log.file.path" : "/var/log/dev-collateral/uaa.2020-09-26.txt"
}
Search with regexp
,
GET myindex2/_search
{
"query": {
"regexp": {
"log.file.path": "/var/log/dev-collateral/uaa.2020-09-26.*"
}
}
}
Upvotes: 2