SSG
SSG

Reputation: 1505

Update ElasticSearch documents matching specific criteria

I want to bulk update the documents matching the criteria.
ES version : 5.1.1
Index: index_1234
Type: addresses
URL: POST http://localhost:9200/index_1234/addresses/_update_by_query
Payload:

{
    "id":1,
    "address":"temp address"
}

I am using following inline script to update the documents

{
  "script": {
     "inline": "if(ctx._source.containsKey(\"address\") && ctx._source.address == "temp address"){ctx._source.address='perm address'}"
  }
}

i.e If "address" field has value as "temp address" , I am replacing it with "perm address"

This script works perfectly and updates the matching documents only. However I have a doubt

Suppose there are total 10 documents out of which 5 have "address" field as "temp address" and 5 have "address" field as "perm address"

On executing above script it gives following o/p

{
    "took": 131,
    "timed_out": false,
    "total": 10,
    **"updated": 10**,
    "deleted": 0,
    "batches": 1,
    "version_conflicts": 0,
    "noops": 0,
    "retries": {
        "bulk": 0,
        "search": 0
    },
    "throttled_millis": 0,
    "requests_per_second": -1,
    "throttled_until_millis": 0,
    "failures": []
}

so although it updated only 5 documents (I have double checked), in final response it says "updated": 10. I was expecting "updated":5 , am I missing something ? is it updating anything else ? I can see "_version" getting updated for all the documents i.e. even for documents which don't have matching string.
Thanks in advance !

UPDATE:

Thanks a lot Mike for quick response :)
According to Mike field check in the update query should be updated as follows.

"query": {
  "exists": {
    "field": "address"
  }
}

However there is 1 more issue.
Initially I had kept the fields to minimum for simplicity of the question. Actually there are many fields in the payload and I want to update 3 of them based on some condition.

So , the payload is as follows

{
    "id":12,
    "address":"temp address",
    "email":"temp email",
    "phone":"temp phone",
    .
    .
    .
}

And I am using following script to update all 3 fields

{
  "script": {
     "inline": "if(ctx._source.containsKey(\"address\") && ctx._source.address == "temp address"){ctx._source.address='perm address'}if(ctx._source.containsKey(\"email\") && ctx._source.email == "temp email"){ctx._source.email='perm email'}if(ctx._source.containsKey(\"phone\") && ctx._source.phone == "temp phone"){ctx._source.phone='perm phone'}"
  }
}

can we update the Mike's solution for multiple fields ?? or Is there any alternative way we can do this ? Thanks again !

Upvotes: 1

Views: 991

Answers (1)

Miek
Miek

Reputation: 1228

This is happening because even though you're not actually modifying all the documents you retrieve in the _update_by_query request, they are at least being counted as hits in your query.

Instead of doing your field check in the update query, add this exist query to your update request:

"query": {
  "exists": {
    "field": "address"
  }
}

Editing to respond to the updated question: If you want to update 3 fields, you're most likely going to have the easiest time just doing 3 simultaneous modifications via the single script. As shown above, the count isn't really reflective of the actual modifications being done to the documents. If the count is absolutely required, you could do this (and it would filter to any document that has any one of these 3.)

"query": {
  "bool": {
    "minimum_should_match": 1,
    "should": [
      {
        "term": {
          "address": {
            "value": "temp address"
          }
        }
      },
      {
        "term": {
          "email": {
            "value": "temp email"
          }
        }
      },
      {
      "term": {
        "phone": {
          "value": "temp phone"
        }
      }
    }
  ]
  }
  }

Upvotes: 2

Related Questions