Reputation: 381
I had a query which was performing as expected on a few dozen records. We've started feeding more data into our ES instance and now I am not getting any results back:
First query:
{
"query": {
"bool": {
"must": [
{
"match": {
"message": "new connection attempt failed: null"
}
}
]
}
}
}
I get a number of records back. This shows the records are actually in my index as I expect.
If I pick one of the records from the result:
{ "_index": "logstash-2018.04.12", "_type": "log", "_id": "AWK3J1xarbUl8ovcY8uv", "_score": 6.621839, "_source": { "cluster": "dev-east-1-c5", "offset": 35858135, "level": "ERROR", ... }and then add a term filter to only get the entries for a specific cluster, I get nothing back (but only when the index gets loaded up with more than a couple thousand records).
{
"query": {
"bool": {
"must": [
{
"match": {
"message": "new connection attempt failed: null"
}
}
],
"filter": [
{
"term": {
"cluster": "dev-east-1-c5"
}
}
]
}
}
}
To describe in plain English what I am trying to do:
message -- match any entry which contains the message string
then filter those to only return entries where the clustername is an exact match.
Edit 4/12/18 -- Adding mapping for log type as requested
{
"logstash-2018.04.12":{
"mappings":{
"log":{
"_all":{
"enabled":true,
"norms":false
},
"dynamic_templates":[
{
"message_field":{
"path_match":"message",
"match_mapping_type":"string",
"mapping":{
"norms":false,
"type":"text"
}
}
},
{
"string_fields":{
"match":"*",
"match_mapping_type":"string",
"mapping":{
"fields":{
"keyword":{
"ignore_above":256,
"type":"keyword"
}
},
"norms":false,
"type":"text"
}
}
}
],
"properties":{
"@timestamp":{
"type":"date",
"include_in_all":false
},
"@version":{
"type":"keyword",
"include_in_all":false
},
"application_name":{
"type":"text",
"norms":false,
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"application_version":{
"type":"text",
"norms":false,
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"beat":{
"properties":{
"hostname":{
"type":"text",
"norms":false,
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"name":{
"type":"text",
"norms":false,
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"version":{
"type":"text",
"norms":false,
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
}
}
},
"cluster":{
"type":"text",
"norms":false,
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"geoip":{
"dynamic":"true",
"properties":{
"ip":{
"type":"ip"
},
"latitude":{
"type":"half_float"
},
"location":{
"type":"geo_point"
},
"longitude":{
"type":"half_float"
}
}
},
"host":{
"type":"text",
"norms":false,
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"input_type":{
"type":"text",
"norms":false,
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"level":{
"type":"text",
"norms":false,
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"level_value":{
"type":"long"
},
"logger_name":{
"type":"text",
"norms":false,
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"message":{
"type":"text",
"norms":false
},
"offset":{
"type":"long"
},
"source":{
"type":"text",
"norms":false,
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"tags":{
"type":"text",
"norms":false,
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"thread_name":{
"type":"text",
"norms":false,
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
},
"type":{
"type":"text",
"norms":false,
"fields":{
"keyword":{
"type":"keyword",
"ignore_above":256
}
}
}
}
}
}
}
}
Upvotes: 1
Views: 2313
Reputation: 381
There were two issues:
The first issue was mentioned in my comment. By doing a term filter on plain "cluster" and not "cluster.keyword" an analyzer was tweaking things and I was not getting hits on exact matches. (this appears to be the approach in post 2.x)
The second issue was on the bool match for message. match has no notion of position and was giving all sorts of unexpected results for large data sets. The fix was to change the bool match to a bool match_phrase and then update the filter according.
It seems to be working as I want now. I am somewhat concerned there may be a more performant way to do this. I saw some people were using wildcards and I believe this is a slight improvement over that. Not sure if there is a guru approach of which I am not aware.
Upvotes: 1