Reputation: 2817
I have an index of consumers with a keyword field "ssn" and a date field "created". I index consumers every day.
What I want to do is find all consumers in the index that share an "ssn", but only for the "ssn"s of the consumer that have been created on a specific date.
In pseudo sql,
select ssn, count(*) from consumer where (select ssn from consumer where created = '2024-11-06') group by ssn
e.g. I want to find all consumers created on '2024-11-06' whose ssn has more than one document throughout the index. The following query is incorrect. It first filters out the consumers for the specified date and then checks the doc count. So I only get the consumers for that date rather than all the consumers for other dates as well that have the same ssn.
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"created": {
"gte" : "2024-11-06",
"lte" : "2024-11-06||+1d"
}
}
}
]
}
},
"aggs": {
"shared_ssn": {
"terms": {
"field": "ssn",
"min_doc_count": 2,
"size": 10000
}
}
}
}
How do I write a query that will find all documents that have a shared ssn, but only for consumers indexed (created)on the specified date.
Almost got it using a filter aggregation
{
"size": 0,
"aggs": {
"our_buckets": {
"terms": {
"field": "ssnKeyword",
"min_doc_count": 2,
"size": 10000
},
"aggs": {
"created_date": {
"filter" : {
"range": {
"created": {
"gte" : "2024-11-06",
"lte" : "2024-11-06||+1d"
}
}
},
"aggs": {
"shared_ssn": {
"top_hits": {}
}
}
}
}
}
}
}
The only problem I have is that I in addition to the consumers that I want, I also get back buckets with no documents
{
"key": "123456789",
"doc_count": 2,
"created_date": {
"doc_count": 0,
"shared_ssn": {
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
}
}
I can filter it out in the client. But would be good to not get it at all. I tried adding a nested "terms" aggregation, but didn't know how to filter out the "created_date" aggregation.
Upvotes: 0
Views: 28