Horst Seirer
Horst Seirer

Reputation: 49

Elasticsearch: ingest node - use script processor to populate index fields

I need to populate some index fields with formatted strings by converting the data of other index fields. In order to do this I defined an ingest pipeline containing a script processor. Everything compiles; but upon indexing the target fields are not getting populated with any values.

index:

PUT my_index
{
  "mappings": {
    "product": {
      "properties": {
        "product_name": {"type": "text", "index": true},
        "formatted_product_name": {"type": "keyword", "index": true},
        "production_date": {"type": "keyword", "index": "true"},
        "formatted_date": {"type": "keyword", "index": "true"}
      }
    }
  }
}

With this example index at hand I would like to get the fields formatted_product_name and formatted_date populated by the ingest pipeline logic.

ingest pipeline (without any real logic):

PUT _ingest/pipeline/product_data_preprocessing
{
    "processors" : [
 {"script": {
    "lang": "painless",
    "inline": "def source_fields = [ctx.product_name, ctx.production_date]; def target_fields = [ctx.formatted_product_name, ctx.formatted_date];  for(def i=0; i<source_fields.length; i++) { target_fields[i] = source_fields[i]; }"
    }}
    ]
}

data:

 PUT _bulk?pipeline=product_data_preprocessing
{"index": {"_index": "my_index", "_type": "product", "_id": "1"}}
{"product_name": "ipad", "production_date": "2017-02-17"}
{"index": {"_index": "my_index", "_type": "product", "_id": "2"}}
{"product_name": "tv", "production_date": "2017-10-07"}

query:

GET my_index/product/_search

{
    "query": {
        "match_all": {}
    }
}

Remark: the following pipeline works. But this would not scale. Therefore I'm looking for a way to populate a set of target fields by processing the values of some source index fields in a dynamic way.

PUT _ingest/pipeline/product_data_preprocessing
{
    "processors" : [
 {"script": {
    "lang": "painless",
    "inline": "ctx.formatted_date = ctx.production_date"
    }}
    ]
}

So is there a way to define a (painless) script in an ingest pipeline processor to populate a set of index fields dynamically by defining a set of source fields and a set of target fields plus appropriate processing logic?

Upvotes: 3

Views: 3703

Answers (1)

Shadi
Shadi

Reputation: 10355

I've been searching for how to add a count field using an ingest pipeline and came across your question. After lots of trial and error, I managed to write a pipeline that splits a string by newlines then adds a field for the number of entries in the split array. Not sure if it'll help, but here it is anyway

{
  "description" : "split content from Tika into rows",
  "processors" : [
    {
      "gsub": {
        "field": "content",
        "pattern": "\\t+",
        "replacement": " "
      }
    },
    {
      "split": {
        "field": "content",
        "separator": "\\n"
      }
    },
    {
      "script": {
        "inline": "ctx.nrows = ctx.content.size()"
      }
    }
  ]
}

Note that ctx.content will be the result of the previous 2 processors

Upvotes: 0

Related Questions