puppylpg
puppylpg

Reputation: 1220

Is it possible to set new field value when analyzing document being indexed in Elasticsearch?

For example:

  1. when indexing one document into elasticsearch;
  2. i want to analyze a field named description in the document by uax_url_email tokenizer/analyzer;
  3. if description does have any url, put the url into another field named urls array;
  4. finish index this document;

Now i can check whether field urls is empty to know whether description has any url.

Is this possible? Or does analyzer only contributes to the inverted index, not other fields?

Upvotes: 0

Views: 282

Answers (1)

Sagar Patel
Sagar Patel

Reputation: 5486

You can use Ingest Pipeline Script processor with painless script. I hope this will help you.

POST _ingest/pipeline/_simulate?verbose
{
  "pipeline": {
    "processors": [
      {
        "script": {
          "description": "Extract 'tags' from 'env' field",
          "lang": "painless",
          "source": """
            
            def m = /(http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])/.matcher(ctx["content"]);
            ArrayList urls = new ArrayList();
            while(m.find())
            {
              urls.add(m.group());
            }
            ctx['urls'] = urls;
          """,
          "params": {
            "delimiter": "-",
            "position": 1
          }
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "content": "My name is Sagar patel and i visit https://apple.com and https://google.com"
      }
    }
  ]
}

Above Pipeline will generate result like below:

{
  "docs": [
    {
      "processor_results": [
        {
          "processor_type": "script",
          "status": "success",
          "description": "Extract 'tags' from 'env' field",
          "doc": {
            "_index": "_index",
            "_id": "_id",
            "_source": {
              "urls": [
                "https://apple.com",
                "https://google.com"
              ],
              "content": "My name is Sagar patel and i visit https://apple.com and https://google.com"
            },
            "_ingest": {
              "pipeline": "_simulate_pipeline",
              "timestamp": "2022-07-13T12:45:00.3655307Z"
            }
          }
        }
      ]
    }
  ]
}

Upvotes: 1

Related Questions