mloureiro
mloureiro

Reputation: 949

Upsert by query

I'm attempting to create or update a document which I don't have the ID of. So I am currently doing is searching/fetching the existing (or not) document, update it and push it back, and it is working.

However I would like to do it all at once.

I've read about the update by query which doesn't look that will work for this case. Also I've tried to do with scripts, but only found references for updating (so I need the ID).

Not sure if this is even possible on ES.

Any help/tips are highly appreciated.

Thanks


More info:

In my case I don't have a direct relation with the IDs, that is why I intended to update by query

The document I have is simple as this:

{
  "text": "some text",
  "type": "a real type",
  "occurences": 2
}

So I would have to match it by both text and type keys. If it doesn't exist it should add a new document (with occurences as 1), if it is found, it should update the occurences to 3.

Following the documentation of update_by_query, it should be possible to do something like:

POST /test/type/_update_by_query?conflicts=proceed
{
  "query": {
    "bool": {
      "must": [
         {"match_phrase": {"text": "some text"}},
         {"match_phrase": {"type": "a real type"}}
      ]
    }
  }
}

But I have no idea how to go from here.

Upvotes: 5

Views: 5538

Answers (2)

MytyMyky
MytyMyky

Reputation: 608

Recent documentation for the Update API at https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html#upserts demonstrates how upserts can now be done. the sample uses the upsert property to supply the document's initial contents if it does not exist:

POST test/_update/1
{
  "script": {
    "source": "ctx._source.counter += params.count",
    "lang": "painless",
    "params": {
      "count": 4
    }
  },
  "upsert": {
    "counter": 1
  }
}

Upvotes: 0

Daniel Staleiny
Daniel Staleiny

Reputation: 434

I stumbled upon the exact issue when using dynamic generated ids without storing them.

I believe it is not possible to do in one query but you can use _update_by_query and check response body for update count and if it is 0 then you can safely insert new instance.

so in your case it would be something like this:

    POST /test/type/_update_by_query
    {
      "script": {
        "inline": "ctx._source.occurences++"
      },
      "query": {
        "bool": {
          "must": [
             {"match_phrase": {"text": "some text"}},
             {"match_phrase": {"type": "a real type"}}
          ]
        }
      }
    }

Response could be:

  {
           "took": 2,
           "timed_out": false,
           "total": 0,
           "updated": 0,
           "deleted": 0,
           "batches": 0,
           "version_conflicts": 0,
           "noops": 0,
           "retries": {
              "bulk": 0,
              "search": 0
           },
           "throttled_millis": 0,
           "requests_per_second": -1,
           "throttled_until_millis": 0,
           "failures": []
 }

Check for: if(response.updated == 0) like so. True => Safely insert new object. (check for conflicts as well)

POST /test/type/
{
  "text": "some text",
  "type": "a real type",
  "occurences": 1
}

ELSE do nothing and your occurrences count was updated.

With this solution You can end up with race condition and you will get version_conflicts. If you have this problem you can do 3 things.

  1. Use queue and worker to run request after request.
  2. Use simple query to get ids and use upserts where you can specify number of retries on conflict and many other things. also bulk update is option.
  3. Use these options:

    waitForCompletion: true, conflicts:"proceed", refresh: true

This will cause request to hang until it is resolved so response time will be bigger and it will wait for completion and block. Refresh after each index is quite bad practice as well because it will re-index your data. This will cause to update version and you will not have version conflicts anymore.

Upvotes: 5

Related Questions