sagar agarwal
sagar agarwal

Reputation: 142

How to exclude a large set of of ids from elasticsearch result?

I have a lot of Products indexed in elasticsearch. I need to exclude a list of ids (that I am fetching from a SQL database), from a query in elasticsearch. Suppose Products are stored as,

{
  "id" : "1",
  "name" : "shirt",
  "size" : "xl"
}

We show a list of recommended products to a customer based on some algorithm using elasticsearch. If a customer marks a product as 'Not Interested', we don't have to show him that product again. We keep such products in a separate SQL table with product_id, customer_id and status 'not_interested'.

Now while fetching recommendations for a customer on runtime, we get the list of 'not_interested' products from the SQL database, and send the array of product_ids in a not filter in elasticsearch to exclude them from recommendation. But the problem arises, when the size of product_ids array becomes too large.

How should I store the product_id and customer_id mapping in elasticsearch to filter out the 'not_interested' products on runtime using elasticsearch only?

Will it make sense to store them as nested objects or parent/child documents.? Or some completely other way to store such that I can exclude some ids from the result efficiently.

Upvotes: 3

Views: 4461

Answers (3)

Thomas Decaux
Thomas Decaux

Reputation: 22671

Use "ids" query:

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-ids-query.html

{
    "query": {
        "ids" : {
            "type" : "my_type",
            "values" : ["1", "4", "100"]
        }
    }
}

Wrapped inside a bool > must_not.

Upvotes: 1

Eli
Eli

Reputation: 4926

Add Terms under must_not section like the following:

{
  "must_not": [
    {
      "terms": {
        "id": [
          "1",
          "3",
          "5"
        ]
      }
    }
  ]
}

Upvotes: 0

drjz
drjz

Reputation: 657

You can exclude IDs (or any other literal strings) efficiently using a terms query.

Both Elasticsearch and Solr have this. It is very powerful and very efficient.

Elasticsearch has this with the IDS query. This query is in fact a terms query on the _uid field. Make sure you use this query in a mustNot clause within a bool query. See: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-ids-query.html

In Solr you can use the terms query within a fq like fq=-{!terms f=id}doc334,doc125,doc777,doc321,doc253. Note the minus to indicate that it is a negation. See: http://yonik.com/solr-terms-query/

Upvotes: 4

Related Questions