altralaser
altralaser

Reputation: 2073

Result filter and pagination in Elasticsearch

I need some help or an idea for the correct procedure.
I already indexed a big vaste of documents. Now I found out that there are some documents with almost the same content, f.e.

{
  "title": "myDocument",
  "date": "2017-09-18",
  "page": 1
}

{
  "title": "myDocument",
  "date": "2017-09-18",
  "page": 2
}

The title field is mapped as text, date is date and page is integer. As you can see the only difference is the page value.
Now I want to make a query and filter out these duplicates. Field collapsing seems a good way to do it but in this case I can't get the correct count of results and that's important for me.
An other way would be to get all results first and then filter out "manually" but then I have a problem with pagination.

Upvotes: 0

Views: 591

Answers (1)

Hatim Stovewala
Hatim Stovewala

Reputation: 1251

Try something like this.

GET index/type/_search
{
  "aggs": {
    "count_by_title_date_page":{
      "terms": {
        "field": "title.keyword",
        "size": 100
      },
      "aggs": {
        "date": {
          "terms": {
            "field": "date.keyword",
            "size": 100
          },
          "aggs": {
            "page": {
              "terms": {
                "field": "page.keyword",
                "size": 100
              }
            }
          }
        }
      }
    }
  }
}

Upvotes: 0

Related Questions