Stpete111
Stpete111

Reputation: 3427

Elasticsearch query to find duplicate values of one field and return the value of another like GROUP BY

ElasticSearch 6.4 - given an index with documents with a field called CaptureId and a field called SourceId: we need to search for duplicate records by CaptureId value. The SourceId field can have many records with the same value, and we want to return only one SourceId per set of duplicates found. So the output would be a list of SourceIds (listed only one time each) which contain any number of duplicate CaptureId values.

How would I create this query in ElasticSearch?

Here is the document mapping:

"mappings": {
            "fla_doc": {
                "_field_names": {
                    "enabled": false
                },
                "properties": {
                
                    "captureId": {
                        "type": "long"
                    },
                    "capturedDateTime": {
                        "type": "date"
                    },
                    "language": {
                        "type": "text"
                    },
                    "sourceId": {
                        "type": "long"
                    },
                    "sourceListType": {
                        "type": "text"
                    },
                    "region": {
                        "type": "text"
                    }
                }
            }
        }

Upvotes: 1

Views: 5002

Answers (1)

Joe - Check out my books
Joe - Check out my books

Reputation: 16895

Assuming both of these ID fields are of the keyword data type, you could do the following:

GET index_name/_search
{
  "size": 0,
  "aggs": {
    "by_duplicate_capture": {
      "terms": {
        "field": "CaptureId",
        "min_doc_count": 2
      },
      "aggs": {
        "by_underlying_source_ids": {
          "terms": {
            "field": "SourceId",
            "size": 1
          }
        }
      }
    }
  }
}

In case you're interested in more SourceIDs, increase the size param.

Upvotes: 1

Related Questions