Reputation: 323
suppose following is mapping for employee working :
{
"Project":
{
"startDate":{"type":"Date"},
"endDate":{"type":"Date"},
"employees":{"type":"keyword"}
}
}
PFB sample data:
{
"Project1":
{
"startDate":"2019-07-01",
"endDate":"2019-07-03"
"employees":["emp1","emp2"]
}
},
{
"Project2":
{
"startDate":"2019-07-02",
"endDate":"2019-07-04"
"employees":["emp3","emp4"]
}
}
Here employees have list of employees working on that project. I am unable to write aggregation query which gives me number of employees working on each day. i.e. Employee is working when he is any one or more projects having startdate <= currentDate and endDate >=currentDate.
I want following result:
{
{
"key":"2019-07-01",
"EmployeeCount":2
},
{
"key":"2019-07-02",
"EmployeeCount":4
},
{
"key":"2019-07-03",
"EmployeeCount":4
},
{
"key":"2019-07-04",
"EmployeeCount":2
}
}
Can you guide me which aggregation will help me solve this?
Upvotes: 1
Views: 191
Reputation: 8840
I'm afraid what you are looking for is not possible.
Elasticsearch doesn't support the date histogram
aggregation in such a way to pick up the value from two different date fields, which in your case is startDate
and endDate
.
The only way to get what you want is:
startDate
endDate
Below is the sample aggregation for calculating the count of employees using startDate
.
PUT mysampleindex
{
"mappings": {
"properties": {
"project":{
"properties": {
"startDate":{
"type":"date"
},
"endDate":{
"type":"date"
},
"employees":{
"type":"keyword"
}
}
}
}
}
}
POST mysampleindex/_doc/1
{
"project":
{
"startDate":"2019-07-01",
"endDate":"2019-07-03",
"employees":["emp1","emp2"]
}
}
POST mysampleindex/_doc/2
{
"project":
{
"startDate":"2019-07-02",
"endDate":"2019-07-04",
"employees":["emp3","emp4"]
}
}
POST mysampleindex/_search
{
"size": 0,
"aggs": {
"mydates": {
"date_histogram": {
"field": "project.startDate",
"interval": "day",
"format" : "yyyy-MM-dd"
},
"aggs": {
"emp_count": {
"value_count": {
"field": "project.employees"
}
}
}
}
}
}
Note that I've used date histogram
aggregation with day
as interval along with the value_count
aggregation on employees
as its child aggregation.
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"mydates" : {
"buckets" : [
{
"key_as_string" : "2019-07-01",
"key" : 1561939200000,
"doc_count" : 1,
"emp_count" : { <---- Count of employees
"value" : 2
}
},
{
"key_as_string" : "2019-07-02",
"key" : 1562025600000,
"doc_count" : 1,
"emp_count" : { <---- Count of employees
"value" : 2
}
}
]
}
}
}
You can count the employees in similar way for endDate
(just replace startDate
with endDate
in above aggregation query). Once you have both the results, you could perform the addition of what you are looking for in your service layer.
Although technically this doesn't, I just hope this helps!
Upvotes: 2