user3290807
user3290807

Reputation: 391

Distinct records in Query matching for Elastic Search

I have the following record set in elastic search

POST /books/book/1
{
  "title" : "JavaScript: The Good Parts",
  "author" : "Douglas Crockford",
  "language" : "JavaScript",
  "publishYear" : 2009,
  "soldCopy" : "50"
}

POST /books/book/2
{
  "title" : "JavaScript: The Good Parts",
  "author" : "Douglas Crockford",
  "language" : "JavaScript",
  "publishYear" : 2009,
  "soldCopy" : "110"
}

POST /books/book/3
{
  "title" : "JavaScript: The Good Parts",
  "author" : "Douglas Crockford1",
  "language" : "JavaScript",
  "publishYear" : 2011,
  "soldCopy" : "2"
}

POST /books/book/4
{
  "title" : "JavaScript: The Good Parts",
  "author" : "Douglas Crockford2",
  "language" : "JavaScript",
  "publishYear" : 2012,
  "soldCopy" : "5"
}

I am using the following elastic search query to get the distinct title and author based on a given year 2009.The output for the query that i am expecting is

JavaScript: The Good Parts Douglas Crockford

but in the response i get back 2 records with same ouput like :

JavaScript: The Good Parts      Douglas Crockford
JavaScript: The Good Parts      Douglas Crockford

Query used for elastic search was :

{
  "query": {
    "match": {
      "publishYear": "2009"   }
  }
}

equivalent select query that i am try to create in database terms is :

select distinct title,author from book where publishYear = '2009'

How could i get the same output from elastic search as my sql query ? Thanks

Upvotes: 1

Views: 2206

Answers (1)

ChintanShah25
ChintanShah25

Reputation: 12672

Distinct in sql is equivalent to terms aggregation in elasticsearch.

{
  "query": {
    "match": {
      "publishYear": "2009"
    }
  },
  "aggs": {
    "unique_author": {
      "terms": {
        "field": "author",
        "size": 10
      }
    },
    "unique_book": {
      "terms": {
        "field": "title",
        "size": 10
      }
    }
  },
  "size": 0
}

For this to work you would have to make title and author field as not_analyzed or you could also use keyword tokenizer with lowercase token filter. The better option would be to make them multi fields

You could create index like this

PUT books
{
  "mappings": {
    "book":{
      "properties": {
        "title":{
          "type": "string",
          "fields": {
            "raw":{
              "type": "string",
              "index": "not_analyzed"
            }
          }
        },
        "author":{
          "type": "string",
          "fields": {
            "raw":{
              "type": "string",
              "index": "not_analyzed"
            }
          }
        },
        "language":{
          "type": "string"
        },
        "publishYear":{
          "type": "integer"
        },
        "soldCopy":{
          "type": "string"
        }
      }
    }
  }
}

Then use .raw in aggregation.

Upvotes: 1

Related Questions