Sonu Mishra
Sonu Mishra

Reputation: 1779

How to create a new field from existing fields during indexing in ElasticSearch?

I have a Lambda that receives events from Kinesis and writes the event to ElasticSearch cluster.

doc id FirstTimestamp
d1 15974343498

Now when we receive another event, I want to update the document in the ElasticSearch to

doc id FirstTimestamp SecondTimestamp TimeTag
d1 15974343498 15974344498 1000

How can I do this without having to first GET the existing doc from ElasticSearch and then doing a PUT?

I found the update option here using which I can add the field SecondTimestamp, but how can I add the TimeTag field; it requires us to do an operation using the FirstTimestamp.

Upvotes: 1

Views: 1219

Answers (1)

Joe - Check out my books
Joe - Check out my books

Reputation: 16943

The GET operation won't be necessary.

Depending on how easily you can configure how your writes happen, you could do the following:

  1. Store a script which expects the doc-to-be-updated content as params:
POST _scripts/manage_time_tags
{
  "script": {
    "lang": "painless", 
    "source": """
      if (ctx._source.FirstTimestamp != null && params.FirstTimestamp != null) {
        ctx._source.SecondTimestamp = params.FirstTimestamp;
        ctx._source.TimeTag = ctx._source.SecondTimestamp - ctx._source.FirstTimestamp;
      }
    """
  }
}
  1. Instead of directly writing to ES as you were up until now, use the upsert method of the Update API:
POST myindex/_update/1
{
  "upsert": {
    "id": 1,
    "FirstTimestamp": 15974343498
  },
  "script": {
    "id": "manage_time_tags",
    "params": {
      "id": 1,
      "FirstTimestamp": 15974343498
    }
  }
}

This will ensure that if the document does not exist yet, the contents of upsert are synced and the script doesn't even run.

  1. As new events come in, simply call /_update/your_id again but with the most recent contents of id and FirstTimestamp.
POST myindex/_update/1
{
  "upsert": {
    "id": 1,
    "FirstTimestamp": 15974344498         
  },
  "script": {
    "id": "manage_time_tags",
    "params": {
      "id": 1,
      "FirstTimestamp": 15974344498
    }
  }
}

Note: this should not be confused with a rather poorly named scripted upsert which'll run the script irregardless of whether the doc already exists or not. This option should be omitted (or set to false).

Upvotes: 1

Related Questions