Reputation: 19
I have ElasticSearch index with records like:
{
"project" : "A",
"updated" : <date>,
"cost" : 123
},
{
"project" : "A",
"updated" : <date>,
"cost" : 1
},
{
"project" : "B",
"updated" : <date>,
"cost" : 3
},
{
"project" : "B",
"updated" : <date>,
"cost" : 4
},
{
"project" : "C",
"updated" : <date>,
"cost" : 5
}
I'm trying to draw "cost" chart by selected projects. Can anyone help me to build a query to get a sum of cost, grouped data by a project? F.e. I want to select data for projects "A" & "B" and get something like:
date1 ->
projectA -> sum(cost)
projectB -> sum(cost)
date2 ->
projectA -> sum(cost)
projectB -> sum(cost)
Have no idea how to modify this query, which extracts data for the one project:
"query": {
"bool": {
"must": [
{
"match": {
"project": {
"query": <project>,
"type": "phrase"
}
}
},
{
"range": {
"updated": {
"gte": <startDate>,
"format": "epoch_millis"
}
}
}
]
}
},
"aggs": {
"3": {
"date_histogram": {
"field": "End_Time",
"interval": "1M",
"time_zone": "CST6CDT",
"min_doc_count": 1
},
"aggs": {
"2": {
"sum": {
"field": "cost"
}
}
}
}
}
Upd: Thanks guys! With your help I wrote the query:
{
"query": {
"bool": {
"must": [
{
"range": {
"End_Time": {
"gte": 1485892800000,
"format": "epoch_millis"
}
}
}
],
"should": [
{
"match": {
"Project_Name": {
"query": "A",
"type": "phrase"
}
}
},
{
"match": {
"Project_Name": {
"query": "B",
"type": "phrase"
}
}
}
]
}
},
"aggs": {
"3": {
"date_histogram": {
"field": "End_Time",
"interval": "1M",
"time_zone": "CST6CDT",
"min_doc_count": 1
},
"aggs": {
"project_agg": {
"terms": {
"field": "Project_ID"
},
"aggs": {
"2": {
"sum": {
"field": "Cost"
}
}
}
}
}
}
}
}
But it returns something strange:
"aggregations": {
"3": {
"buckets": [
{
"key_as_string": "2017-02-01T00:00:00.000-06:00",
"key": 1485928800000,
"doc_count": 17095,
"project_agg": {
"doc_count_error_upper_bound": 36,
"sum_other_doc_count": 3503,
"buckets": [
{
"2": {
"value": 2536.8616891294323
},
"key": 834879987748,
"doc_count": 2243
},
{
"2": {
"value": 3438.766646153458
},
"key": 497952557271,
"doc_count": 1785
},
{
"2": {
"value": 13066.367076588496
},
"key": 1057394416300,
"doc_count": 1736
},
...
Here are 10 buckets for each month. I expect to see only 2 values for each project. What's wrong?
Upvotes: 1
Views: 59
Reputation: 116
You need to aggregate on project before you aggregate cost:
{
"aggs": {
"3": {
"date_histogram": {
"field": "End_Time",
"interval": "1M",
"time_zone": "CST6CDT",
"min_doc_count": 1
},
"aggs": {
"2": {
"terms": {
"field": "project"
},
"aggs": {
"1": {
"sum": {
"field": "cost"
}
}
}
}
}
}
}
}
For the filtering it depends on how you want to do the search. For a list of projects you could use:
"query": {
"bool": {
"must": [
{ "terms": { "project": [ "a", "b" ] } } //Assuming field is mapped as "analyzed"
]
}
}
In case your mapping contains a .keyword variety, you would format the terms filter like this: { "terms": { "project.keyword": [ "A", "B" ] } } //Assuming field is mapped as "not_analyzed" or has a keyword field. Here is an example of how a field is mapped in ES 5.5 as "text" with a "keword" field:
"ShortTextContent" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
In this case I can access the analyzed version using "ShortTextContent" and the not_analyzed version using "ShortTextContent.keyword"
Upvotes: 1
Reputation: 751
The query you wrote gives you total cost(irrespective of project) per month, you need to have another aggregation to group by project in between aggregation 3
and aggregation 2
.
If you only want for projects A
and B
, then use filter in aggregation.
"size": 0,
"aggs": {
"project": {
"filter": {
"bool": {
"must": [
{
"terms": {
"project": [
"a",
"b"
]
}
}
]
}
},
"aggs": {
"3": {
"date_histogram": {
"field": "End_Time",
"interval": "1M",
"time_zone": "CST6CDT",
"min_doc_count": 1
},
"aggs": {
"project_agg": {
"terms": {
"field": "project"
},
"aggs": {
"2": {
"sum": {
"field": "cost"
}
}
}
}
}
}
}
}
}
Upvotes: 1