user2689782
user2689782

Reputation: 767

Elasticsearch script_fields to update another field?

Is there a way to use the output of an ElasticSearch script_fields to update another variable in the index?

I have an index in ElasticSearch 1.x which has timestamp enabled, but not stored. (See below for mapping)

This means that the timestamp can be accessed for searches, or using script_fields like -

GET twitter/_search
{
     "script_fields": {
       "script1": {
          "script": "_fields['_timestamp']" 
        }
  }
}

I need to extract this timestamp field, and store it in the index. It is easy enough to write a script to copy any other field e.g. (I am using the update API)

ctx._source.t1=ctx._source.message

But how can I use the value from the script_fields output to update another field in the index? I want the field 'tcopy' to get the value of the timestamp for each document.

Further, I tried to use java to get the values as below, but it returned null.

SearchResponse response = client.prepareSearch("twitter")
                .setQuery(QueryBuilders.matchAllQuery())
                .addScriptField("test", "doc['_timestamp'].value")
                .execute().actionGet();

The mapping

 {
         "mappings": {
             "tweet": {
                "_timestamp": {
                   "enabled": true,
                   "doc_values" : true
                },
                "properties": {
                   "message": {
                      "type": "string"
                   },
                   "user": {
                      "type": "string"
                   },
                   "tcopy": {
                      "type": "long"
                   }
                }
             }
          }
    }

Upvotes: 0

Views: 1078

Answers (2)

user2689782
user2689782

Reputation: 767

The _timestamp field can be accessed using java. Then, we can use the Update API to set the new field. The request would look like

SearchResponse response = client.prepareSearch("twitter2")
                .setQuery(QueryBuilders.matchAllQuery())
                .addScriptField("test", "doc['_timestamp'].value")
                .execute().actionGet();

Then I can use UpdateRequestBuilder with a script that uses this value to update the index

Upvotes: 0

Val
Val

Reputation: 217504

You need to do this in two runs:

  1. Run the query and get a mapping id<->timestamp and
  2. Then run a bulk update with the timestamp

So to extract the timestamp data from your twitter index you can for instance use elasticdump like this:

elasticdump \
   --input=http://localhost:9200/twitter \
   --output=$ \
   --searchBody '{"script_fields": {"ts": {"script": "doc._timestamp.value"}}}' > twitter.json

This will produce a file called twitter.json having the following content:

{"_index":"twitter","_type":"tweet","_id":"1","_score":1,"fields":{"ts":[1496806671021]}}
{"_index":"twitter","_type":"tweet","_id":"2","_score":1,"fields":{"ts":[1496807154630]}}
{"_index":"twitter","_type":"tweet","_id":"3","_score":1,"fields":{"ts":[1496807161591]}}

You can then easily use that file to update your documents. First create a shell script named read.sh

#!/bin/sh
while read LINE; do 
    INDEX=$(echo "${LINE}" | jq '._index' | sed "s/\"//g"); 
    TYPE=$(echo "${LINE}" | jq '._type' | sed "s/\"//g"); 
    ID=$(echo "${LINE}" | jq '._id' | sed "s/\"//g"); 
    TS=$(echo "${LINE}" | jq '.fields.ts[0]'); 
    curl -XPOST "http://localhost:9200/$INDEX/$TYPE/$ID/_update" -d "{\"doc\":{\"tcopy\":"$TS"}}"
done

And finally you can run it like this:

./read.sh < twitter.json

After the script has finished running, your documents will have a tcopy field with the _timestamp value.

Upvotes: 1

Related Questions