Upsert by query

Question

I'm attempting to create or update a document which I don't have the ID of. So I am currently doing is searching/fetching the existing (or not) document, update it and push it back, and it is working.

However I would like to do it all at once.

I've read about the update by query which doesn't look that will work for this case. Also I've tried to do with scripts, but only found references for updating (so I need the ID).

Not sure if this is even possible on ES.

Any help/tips are highly appreciated.

Thanks

More info:

In my case I don't have a direct relation with the IDs, that is why I intended to update by query

The document I have is simple as this:

{
  "text": "some text",
  "type": "a real type",
  "occurences": 2
}

So I would have to match it by both text and type keys. If it doesn't exist it should add a new document (with occurences as 1), if it is found, it should update the occurences to 3.

Following the documentation of update_by_query, it should be possible to do something like:

POST /test/type/_update_by_query?conflicts=proceed
{
  "query": {
    "bool": {
      "must": [
         {"match_phrase": {"text": "some text"}},
         {"match_phrase": {"type": "a real type"}}
      ]
    }
  }
}

But I have no idea how to go from here.

Daniel Staleiny · Accepted Answer

I stumbled upon the exact issue when using dynamic generated ids without storing them.

I believe it is not possible to do in one query but you can use _update_by_query and check response body for update count and if it is 0 then you can safely insert new instance.

so in your case it would be something like this:

    POST /test/type/_update_by_query
    {
      "script": {
        "inline": "ctx._source.occurences++"
      },
      "query": {
        "bool": {
          "must": [
             {"match_phrase": {"text": "some text"}},
             {"match_phrase": {"type": "a real type"}}
          ]
        }
      }
    }

Response could be:

  {
           "took": 2,
           "timed_out": false,
           "total": 0,
           "updated": 0,
           "deleted": 0,
           "batches": 0,
           "version_conflicts": 0,
           "noops": 0,
           "retries": {
              "bulk": 0,
              "search": 0
           },
           "throttled_millis": 0,
           "requests_per_second": -1,
           "throttled_until_millis": 0,
           "failures": []
 }

Check for: if(response.updated == 0) like so. True => Safely insert new object. (check for conflicts as well)

POST /test/type/
{
  "text": "some text",
  "type": "a real type",
  "occurences": 1
}

ELSE do nothing and your occurrences count was updated.

With this solution You can end up with race condition and you will get version_conflicts. If you have this problem you can do 3 things.

Use queue and worker to run request after request.
Use simple query to get ids and use upserts where you can specify number of retries on conflict and many other things. also bulk update is option.
Use these options:

waitForCompletion: true, conflicts:"proceed", refresh: true

This will cause request to hang until it is resolved so response time will be bigger and it will wait for completion and block. Refresh after each index is quite bad practice as well because it will re-index your data. This will cause to update version and you will not have version conflicts anymore.

Upsert by query

Answers (2)

Related Questions