Erik Zocher
Erik Zocher

Reputation: 43

index Elasticsearch document with existing "id" field

I have documents that I want to index into Elasticsearch with an existing unique "id" field. I get an array of documents from a REST api endpoint ( eg.: http://some.url/api/products) in no particular order and if a document with the _id already exists in Elasticsearch it should update and reindex the document.

I want to create a new document if no document with the _id in Elasticsearch exists and then update a document, if it matches with an existing document in Elasticsearch.

This could be done with:

PUT products/product/un1qu3-1d-b718-105973677e95 { "id": "un1qu3-1d-b718-105973677e95", "state": "packaged" }

The basic idea is to use the provided "id" field to create or update a document. Extraction of _id from document fields seems deprecated (link). But the indexing/ reindexing of documents with the "id" field can be done manually very easy with the kibana dev tools, with postman or a cURL request. I want to achieve this (re-)indexing of documents that I receive over this api endpoint programmatically.
Is it possible to achieve this with logstash or a simple cronjob? Does Elasticsearch provide any functionality for this? Or do I need to write some custom backend to achieve this?

I thought of either:

1) index the document into Elasticsearch with the "id" field of my document or

2) find an Elasticsearch query that first searches for the document with the specific "id" field and then updates the document.

I was unable to find a solution for either way and have no clue how a good approach would look like.

Can anyone point me into the right direction on how to achieve this, suggest a better approach or provide a solution?

Any help much appreciated!

Update

I solved the problem with the help of the accepted answer. I used Logstash, the Http_poller input plugin, this article: https://www.elastic.co/blog/new-way-to-ingest-part-1 and this elastic.co question: https://discuss.elastic.co/t/upsert-with-logstash/59116

My output of logstash looks like this at the moment:

output {
  elasticsearch {
    index => "products"
    document_type => "product"
    pipeline => "rename_id"
    document_id => "%{id}"
    doc_as_upsert => true
    action => "update"
  }

Update 2

just for the sake of completeness I added the "rename_id" pipeline

{
  "rename_id": {
    "description": "_description",
    "processors": [
      {
        "set": {
          "field": "_id",
          "value": "{{id}}"
        }
      }
    ]
  }
}

It works this way! Thanks alot!

Upvotes: 4

Views: 10760

Answers (3)

bornTalented
bornTalented

Reputation: 625

You can use script to UPSERT (update or insert) your document

PUT /products/product/un1qu3-1d-b718-105973677e95/_update
{
   "script": {
      "inline": "ctx._source.state = \"packaged\"",
      "lang": "painless"
   },
   "upsert": {
      "id": "un1qu3-1d-b718-105973677e95",
      "state": "packaged"
   }
}

Above query find the document with _id = "un1qu3-1d-b718-105973677e95" if it is able to find any document then it will update state to "packaged" otherwise create a new document with field "id" and "state" (you can insert as many fields as you want).

Upvotes: 0

alr
alr

Reputation: 1804

you can use ingest pipelines to extract the id from the body and the _create endpoint to only create a document if it does not exist. Minor note: If you could specify the id on the client side indexing would be faster, as adding a pipeline adds a certain overhead.

PUT _ingest/pipeline/my_pipeline
{
  "description": "_description",
  "processors": [
    {
      "set": {
        "field": "_id",
        "value": "{{id}}"
      }
    }
  ]
}

PUT twitter/tweet/1?op_type=create&pipeline=my_pipeline
{
    "foo" : "bar",
    "id" : "123"
}

GET twitter/tweet/123

# this call will fail
PUT twitter/tweet/1?op_type=create&pipeline=my_pipeline
{
    "foo" : "bar",
    "id" : "123"
}

Upvotes: 2

shan
shan

Reputation: 288

Peter,

If I understand correctly, you want to ingest your documents into elastic search and will have some updates in future for these documents ?

If that's the case, - Use your documents primary key as id for elastic documents. - You can ingest entire document with updated values, elastic will replace the previous document with new one. given the primary key is same. Old document with same id will be deleted.

We use this approach for our search data.

Upvotes: 3

Related Questions