Reputation: 423
Thanks to the fantastic Kibana frontend for my elasticsearch indexes, I'm able to construct a query to pull an hour-by-hour count of records over a specific timespan:
{
"facets": {
"0": {
"date_histogram": {
"field": "@timestamp",
"interval": "1h"
},
"global": true,
"facet_filter": {
"fquery": {
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*"
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"@timestamp": {
"from": "2014-08-01T07:00:00.000Z",
"to": "2014-09-01T06:59:59.999Z"
}
}
},
{
"fquery": {
"query": {
"query_string": {
"query": "tags:\"solr_search\""
}
},
"_cache": true
}
}
]
}
}
}
}
}
}
}
},
"size": 0
}'
Which gives me output like:
{
"took" : 27,
"timed_out" : false,
"_shards" : {
"total" : 155,
"successful" : 155,
"failed" : 0
},
"hits" : {
"total" : 267366,
"max_score" : 0.0,
"hits" : [ ]
},
"facets" : {
"0" : {
"_type" : "date_histogram",
"entries" : [ {
"time" : 1406876400000,
"count" : 120
}, {
"time" : 1406880000000,
"count" : 115
}, {
"time" : 1406883600000,
"count" : 134
}, {
"time" : 1406887200000,
"count" : 87
}, {
"time" : 1406890800000,
"count" : 99
}, {
"time" : 1406894400000,
"count" : 141
}, {
"time" : 1406898000000,
"count" : 168
}, {
"time" : 1406901600000,
"count" : 300
}, {
"time" : 1406905200000,
"count" : 782
}, {
"time" : 1406908800000,
"count" : 1085
}, {
And (using Kibana's help again) I can, for one specific time bucket, get a top 10 list of the most searched terms with a query like this:
{
"facets": {
"terms": {
"terms": {
"field": "searchstring.raw",
"size": 10,
"order": "count",
"exclude": []
},
"facet_filter": {
"fquery": {
"query": {
"filtered": {
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "*"
}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"range": {
"@timestamp": {
"from": 1406876400000,
"to": 1406880000000
}
}
},
{
"fquery": {
"query": {
"query_string": {
"query": "tags:\"solr_search\""
}
},
"_cache": true
}
}
]
}
}
}
}
}
}
}
},
"size": 0
}'
Which gives results like this:
{
"took" : 56,
"timed_out" : false,
"_shards" : {
"total" : 155,
"successful" : 155,
"failed" : 0
},
"hits" : {
"total" : 267366,
"max_score" : 0.0,
"hits" : [ ]
},
"facets" : {
"terms" : {
"_type" : "terms",
"missing" : 0,
"total" : 120,
"other" : 86,
"terms" : [ {
"term" : "term1",
"count" : 11
}, {
"term" : "term2",
"count" : 4
}, {
"term" : "term3",
"count" : 3
}, {
"term" : "term4",
"count" : 3
}, {
"term" : "term5",
"count" : 3
}, {
"term" : "term6",
"count" : 2
}, {
"term" : "term7",
"count" : 2
}, {
"term" : "term8",
"count" : 2
}, {
"term" : "term9",
"count" : 2
}, {
"term" : "term10",
"count" : 2
} ]
}
}
}
What I would like to do is: for each time bucket in the first query's output pull the top 10 terms for that time bucket and put that in the output for each time bucket. I'm still fairly new to the elasticsearch query language and my attempts so far at merging the two queries have gone down in flames. If anyone has any pointers I would appreciate it.
Upvotes: 0
Views: 340
Reputation: 423
I ended up ditching the facets approach for the newer aggregations syntax. Here's what eventually returned what I was looking for:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"range": {
"@timestamp": {
"from": "2014-08-01T00:00:00.000Z",
"to": "2014-09-01T00:00:00.000Z"
}
}
},
{
"fquery": {
"query": {
"query_string": {
"query": "tags:\"solr_search\""
}
},
"_cache": true
}
}
]
}
}
}
},
"aggs": {
"searches_per_hour": {
"date_histogram" : {
"field": "@timestamp",
"interval": "1h",
"format": "yyyy-MM-dd ha"
},
"aggs": {
"top_search_terms": {
"terms": {
"field": "searchstring.raw",
"size": 10,
"shard_size": 300
}
}
}
}
}
}
Maybe this will shorten someone else's work day in the future :)
Upvotes: 1