shivg
shivg

Reputation: 762

Elastic search finding multiple exact values query

I have data stored in elastic index like this

{'name': 'Arnie Metz PhD', 'user_id': 'CL_000960', 'email_id': '[email protected]', 'customer_id': 'CL_2135514566_1427476813'}
{'name': 'Ms. Princess Bernhard', 'user_id': 'CL_000972', 'email_id': '[email protected]', 'customer_id': 'CL_2135514566_1427476810'}
{'name': "Lori O'Kon", 'user_id': 'CL_000980', 'email_id': '[email protected]', 'customer_id': 'CL_2135514566_1427476811'}
{'name': "Ahmad O'Reilly", 'user_id': 'CL_000981', 'email_id': '[email protected]', 'customer_id': 'CL_2135514566_1427476815'}
{'name': 'Lovell Connelly', 'user_id': 'CL_000982', 'email_id': '[email protected]', 'customer_id': 'CL_2135514566_1427476815'}
{'name': 'Errol Feest', 'user_id': 'CL_000989', 'email_id': '[email protected]', 'customer_id': 'CL_2135514566_1427476810'}
{'name': "May O'Conner", 'user_id': 'CL_000990', 'email_id': '[email protected]', 'customer_id': 'CL_2135514566_1427476815'}
{'name': 'Virgie Wyman', 'user_id': 'CL_000999', 'email_id': '[email protected]', 'customer_id': 'CL_2135514566_1427476812'}
{'name': 'Ofelia McClure', 'user_id': 'CL_0001001', 'email_id': '[email protected]', 'customer_id': 'CL_2135514566_1427476814'}
{'name': 'Mr. Edson Rosenbaum Jr.', 'user_id': 'CL_0001003', 'email_id': '[email protected]', 'customer_id': 'CL_2135514566_1427476810'}

what i am trying to get from query is list of email ids from list of user_ids using below queries

Query 1

as per Elastic Doc

{
  "query" : {
    "filtered" : {
        "filter" : {
            "terms" : {
                "user_id" : ["CL_0004430", "CL_0004496"]
            }
        }
     }
   }
 }

this is not giving result. It gives empty result

Query 2

{
 "query": {
   "bool": {
     "must": [
      {
        "match": {
          "user_id": {
          "query": "['CL_00078','CL_00028']",
          "operator": "or"
          }
        }
      }
    ]
  }
 },
 "aggs": {}
}

this is working as expected but the issue is with limitation of the conditional parameter. I cannot give more than 1000 emails in the list.

Is there better way to query to get more than 10000 records in a query?

Upvotes: 1

Views: 1347

Answers (2)

Jeremy
Jeremy

Reputation: 1924

This is a really good question. When storing things like user ids, it's usually better to set them as 'not analyzed.' That way, when you do an exact search for them, you get the expected results. When using the following mapping, your terms query works as expected:

POST test_users
{
  "mappings" :{
    "test_user":{
      "properties": {
        "name": { "type": "string" },
        "user_id": {"type": "string", "index": "not_analyzed"},
        "email_id": {"type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed" }}},
        "customer_id": { "type": "string", "index": "not_analyzed"}
      }
    }
  }
}

POST _bulk
{"create": {"_index": "test_users", "_type": "test_user" }}
{"name": "Arnie Metz PhD", "user_id": "CL_000960", "email_id": "[email protected]", "customer_id": "CL_2135514566_1427476813"}
{"create": {"_index": "test_users", "_type": "test_user" }}
{"name": "Ms. Princess Bernhard", "user_id": "CL_000972", "email_id": "[email protected]", "customer_id": "CL_2135514566_1427476810"}

# returns two results.
GET test_users/test_user/_search
{
 "query": {
    "filtered" : {
      "filter" : {
        "terms": {
          "user_id": ["CL_000960","CL_000972"]
        }
      }
    }
  }
}

The other thing you're going to need to do is set index.query.bool.max_clause_count: 12000 (or some other big number) in your elasticsearch.yml config file and restart your instance. Otherwise you'll get TooManyClauses[maxClauseCount is set to 1024];

After experimenting with my own ElasticSearch instance, passing 10,000 items in a terms array took about 1.5 seconds to return each set of 25 results. This is a single node running on a desktop workstation with a 4 core, 3.40 GHz processor and 8 GB of RAM. You may therefore want to consider a scan and scroll type query.

Upvotes: 3

AlainIb
AlainIb

Reputation: 4728

did you try a filter or ? ( i don't know if there is a limitation , but sending a long query can be slow to send on some low speed connexion)

{
    "query" : {
        "filtered" : {
            "filter" : {
                "or" : [{
                        "term" : {
                            "user_id" : "CL_0004430",
                            "_cache" : false
                        }
                    }, {
                        "term" : {
                            "user_id" : "CL_0004496",
                            "_cache" : false
                        }
                    }
                ]
            }
        }
    }
}

Upvotes: 0

Related Questions