Reputation: 247
I need a query that returns only result that has 1 bucket.
The query below returns me the access data of a visitor grouped by day.
{
"size" : 0,
"query" : {
"filtered" : {
"filter" : {
"bool" : {
"must" : [
{
"range" : {
"start_time" : {
"gte" : "2019-02-06 00:00:00",
"lte" : "2019-02-11 23:59:59"
}
}
}
]
}
}
}
},
"aggs" : {
"UNIQUE" : {
"terms" : {
"size" : 0,
"field" : "username"
},
"aggs" : {
"visits" : {
"date_histogram" : {
"field" : "start_time",
"interval" : "day",
"format" : "yyyy-MM-dd"
}
}
}
}
}
}
I need to know which ones returned only once in the period. So when you have only 1 bucket, it's ONE. And if it has visited for more than a day (buckets> 1) then it is RECURRENT.
Upvotes: 0
Views: 157
Reputation: 247
Solution found was:
{
"size" : 0,
"query" : {
{
"range" : {
"start_time" : {
"gte" : "2019-02-11 00:00:00",
"lte" : "2019-02-11 23:59:59"
}
}
}
},
"aggs" : {
"UNIQUE" : {
"terms" : {
"size" : 0,
"field" : "username"
},
"aggs":{
"visit_date": {
"date_histogram": {
"field" : "start_time",
"interval" : "day",
"format" : "yyyy-MM-dd"
}
},
"count": {
"cardinality": {
"script": "new Date(doc['start_time'].value).format('yyyy-MM-dd')"
}
},
"equal_one":{
"bucket_selector":{
"buckets_path":{
"count":"count.value"
},
"script":"count == 1"
}
}
}
}
}
}
But performance remains a problem. In an environment with about 1 million records this query does not work very well.
Maybe some query using Scripted Metrics would solve, but demand more analysis (doc: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-scripted-metric-aggregation.html)
Upvotes: 0
Reputation: 8860
If I understand it correctly, you'd want a list of users
who have had a unique date or like visited only once in a particular time frame and you'd want both the details, date
and the username
to be in the aggregation
.
I've created a sample mapping, sample documents, aggregation query and how it would appear in the response
PUT mytest
{
"mappings": {
"mydocs": {
"properties": {
"username": {
"type": "keyword"
},
"start_time": {
"type": "date",
"format": "yyyy-MM-dd"
}
}
}
}
}
You can see that I've created 6 documents where John
has visited twice on same date, Jack visits site on two different dates, while Jane
and Rob
visited only once in the time-frame for which I will write an aggregation.
POST mytest/mydocs/1
{
"username": "john",
"start_time": "2018-08-01"
}
POST mytest/mydocs/2
{
"username": "john",
"start_time": "2018-08-01"
}
POST mytest/mydocs/3
{
"username": "jane",
"start_time": "2018-08-01"
}
POST mytest/mydocs/4
{
"username": "rob",
"start_time": "2018-08-01"
}
POST mytest/mydocs/5
{
"username": "jack",
"start_time": "2018-08-01"
}
POST mytest/mydocs/6
{
"username": "jack",
"start_time": "2018-08-02"
}
Note I've added two more documents with username Jack
who visits the site on two different dates, username John
visits the site twice on the same date.
POST mytest/_search
{
"size": 0,
"query": {
"range": {
"start_time": {
"gte": "2017-08-01",
"lte": "2019-08-01"
}
}
},
"aggs": {
"myterms": {
"terms": {
"size": 100,
"field": "username"
},
"aggs": {
"visit_date": {
"date_histogram": {
"field": "start_time",
"interval" : "day",
"format" : "yyyy-MM-dd"
}
},
"count": {
"cardinality": {
"field": "start_time"
}
},
"equal_one":{
"bucket_selector":{
"buckets_path":{
"count":"count.value"
},
"script":"params.count == 1"
}
}
}
}
}
}
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 0,
"hits": []
},
"aggregations": {
"myterms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "john",
"doc_count": 2,
"count": {
"value": 1
},
"visit_date": {
"buckets": [
{
"key_as_string": "2018-08-01",
"key": 1533081600000,
"doc_count": 2
}
]
}
},
{
"key": "jane",
"doc_count": 1,
"count": {
"value": 1
},
"visit_date": {
"buckets": [
{
"key_as_string": "2018-08-01",
"key": 1533081600000,
"doc_count": 1
}
]
}
},
{
"key": "rob",
"doc_count": 1,
"count": {
"value": 1
},
"visit_date": {
"buckets": [
{
"key_as_string": "2018-08-01",
"key": 1533081600000,
"doc_count": 1
}
]
}
}
]
}
}
}
You can see that John now appears in the result as expected even if he has visited site multiple times on same date.
Let me know if you have any queries.
Upvotes: 1