Neo
Neo

Reputation: 5228

Difference of two query results in Elasticsearch

Let's say we've indexes of e-commerce store data, and we want to get the difference of list of products which are present in 2 stores.

Information on the index content: A sample data stored in each document looks like below:

{
   "product_name": "sample 1",
   "store_slug": "store 1",
   "sales_count": 42,
   "date": "2018-04-04"
}

Below are queries which gets me all products present in 2 stores individually,

Data for store 1

curl -XGET 'localhost:9200/store/_search?pretty' -H 'Content-Type: application/json' -d'
   {
       "_source": ["product_name"],
       "query": {
           "constant_score" : {
               "filter" : {
                    "bool" : {
                       "must" : [
                           { "term" : { "store_slug" : "store_1"}}]}}}}}'

Data for store 2

curl -XGET 'localhost:9200/store/_search?pretty' -H 'Content-Type: application/json' -d'
   {
       "_source": ["product_name"],
       "query": {
           "constant_score" : {
               "filter" : {
                    "bool" : {
                       "must" : [
                           { "term" : { "store_slug" : "store_2"}}]}}}}}'

Is it possible with elasticsearch query to get the difference of both result(without doing using some script/ other languages)?

E.g. of above operation: Let's say "store 1" is selling products ["product 1", "product 2"] and "store 2" is selling products ["product 1", "product 3"], So expected output of difference of products of "store 1" and "store 2" is "product 2".

Upvotes: 0

Views: 1925

Answers (1)

Val
Val

Reputation: 217424

Why not doing it in a single query?

Products that are in store 1 but not in store 2:

curl -XGET 'localhost:9200/store/_search?pretty' -H 'Content-Type: application/json' -d '{
  "_source": [
    "product_name"
  ],
  "query": {
    "constant_score": {
      "filter": {
        "bool": {
          "filter": [
            {
              "term": {
                "store_slug": "store_1"
              }
            }
          ],
          "must_not": [
            {
              "term": {
                "store_slug": "store_2"
              }
            }
          ]
        }
      }
    }
  }
}'

You can easily do the opposite, too.

UPDATE

After reading your updates, I think the best way to solve this is using terms aggregations, first by product and then by store and only select the products for which there is only a single store bucket (using a pipeline aggregation)

curl -XGET 'localhost:9200/store/_search?pretty' -H 'Content-Type: application/json' -d '{
{
  "size": 0,
  "aggs": {
    "products": {
      "terms": {
        "field": "product_name"
      },
      "aggs": {
        "stores": {
          "terms": {
            "field": "store_slug"
          }
        },
        "min_bucket_selector": {
          "bucket_selector": {
            "buckets_path": {
              "count": "stores._bucket_count"
            },
            "script": {
              "source": "params.count == 1"
            }
          }
        }
      }
    }
  }
}'

Upvotes: 1

Related Questions