Reputation: 850
For instance, given this result for a search, reduced to a size of 3 hits for brevity:
{
"hits": {
"total": {
"value": 51812937,
"relation": "eq"
},
"max_score": 1.0,
"hits": [
{
"_index": "desc-imunizacao",
"_type": "_doc",
"_id": "7d0ac34a-1d4f-435a-9e5f-6dc2d77bb251-i0b0",
"_score": 1.0,
"_source": {
"vacina_descricao_dose": " 2ª Dose",
"estabelecimento_uf": "BA",
"document_id": "7d0ac34a-1d4f-435a-9e5f-6dc2d77bb251-i0b0"
}
},
{
"_index": "desc-imunizacao",
"_type": "_doc",
"_id": "2dc55c6a-5ac1-4550-8990-5ca611808e8a-i0b0",
"_score": 1.0,
"_source": {
"vacina_descricao_dose": " 1ª Dose",
"estabelecimento_uf": "SE",
"document_id": "2dc55c6a-5ac1-4550-8990-5ca611808e8a-i0b0"
}
},
{
"_index": "desc-imunizacao",
"_type": "_doc",
"_id": "d7e9b381-2873-4d0a-8b2d-5fa5034b7a80-i0b0",
"_score": 1.0,
"_source": {
"vacina_descricao_dose": " 1ª Dose",
"estabelecimento_uf": "SE",
"document_id": "d7e9b381-2873-4d0a-8b2d-5fa5034b7a80-i0b0"
}
}
]
}
}
If I wanted to query for "estabelecimento_uf": "SE"
and keep only one result for duplicates of "document_id"
, I would issue:
{
"_source": ["document_id", "estabelecimento_uf", "vacina_descricao_dose"],
"query": {
"match": {
"estabelecimento_uf": {
"query": "SE"
}
}
},
"collapse": {
"field": "document_id",
"inner_hits": {
"name": "latest",
"size": 1
}
}
}
Is there a way to achieve this with Elasticsearch's Count API? Meaning: count query for field A (estabelecimento_uf) and count for unique values of field B (document_id), knowing that document_id
has duplicates of course.
This is a public API: https://imunizacao-es.saude.gov.br/_search
This is the authentication:
User: imunizacao_public
Pass: qlto5t&7r_@+#Tlstigi
Upvotes: 1
Views: 210
Reputation: 16172
You can use a combination of filter aggregation along with cardinality aggregation, to get a count of unique document id based on a filter
{
"size": 0,
"aggs": {
"filter_agg": {
"filter": {
"term": {
"estabelecimento_uf.keyword": "SE"
}
},
"aggs": {
"count_docid": {
"cardinality": {
"field": "document_id.keyword"
}
}
}
}
}
}
As far as I know, you cannot get the count of distinct field values using count API, you can either use field collapsing feature (as done in the question) OR use cardinality aggregation
Adding a working example with index data, search query and search result
{
"vacina_descricao_dose": " 2ª Dose",
"estabelecimento_uf": "BA",
"document_id": "7d0ac34a-1d4f-435a-9e5f-6dc2d77bb251-i0b0"
}
{
"vacina_descricao_dose": " 1ª Dose",
"estabelecimento_uf": "SE",
"document_id": "2dc55c6a-5ac1-4550-8990-5ca611808e8a-i0b0"
}
{
"vacina_descricao_dose": " 1ª Dose",
"estabelecimento_uf": "SE",
"document_id": "d7e9b381-2873-4d0a-8b2d-5fa5034b7a80-i0b0"
}
{
"vacina_descricao_dose": " 1ª Dose",
"estabelecimento_uf": "SE",
"document_id": "d7e9b381-2873-4d0a-8b2d-5fa5034b7a80-i0b0"
}
Search Query 1:
{
"size": 0,
"query": {
"match": {
"estabelecimento_uf": "SE"
}
},
"aggs": {
"count_doc_id": {
"cardinality": {
"field": "document_id.keyword"
}
}
}
}
Search Result:
"aggregations": {
"count_doc_id": {
"value": 2 // note this
}
}
Search Query 2:
{
"size": 0,
"aggs": {
"filter_agg": {
"filter": {
"term": {
"estabelecimento_uf.keyword": "SE"
}
},
"aggs": {
"count_docid": {
"cardinality": {
"field": "document_id.keyword"
}
}
}
}
}
}
Search Result:
"aggregations": {
"filter_agg": {
"doc_count": 3,
"count_docid": {
"value": 2 // note this
}
}
}
Upvotes: 1