Reputation: 1220
For example:
description
in the document by uax_url_email
tokenizer/analyzer;description
does have any url, put the url into another field named urls
array;Now i can check whether field urls
is empty to know whether description
has any url.
Is this possible? Or does analyzer only contributes to the inverted index, not other fields?
Upvotes: 0
Views: 282
Reputation: 5486
You can use Ingest Pipeline Script processor with painless script. I hope this will help you.
POST _ingest/pipeline/_simulate?verbose
{
"pipeline": {
"processors": [
{
"script": {
"description": "Extract 'tags' from 'env' field",
"lang": "painless",
"source": """
def m = /(http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,@?^=%&:\/~+#-]*[\w@?^=%&\/~+#-])/.matcher(ctx["content"]);
ArrayList urls = new ArrayList();
while(m.find())
{
urls.add(m.group());
}
ctx['urls'] = urls;
""",
"params": {
"delimiter": "-",
"position": 1
}
}
}
]
},
"docs": [
{
"_source": {
"content": "My name is Sagar patel and i visit https://apple.com and https://google.com"
}
}
]
}
Above Pipeline will generate result like below:
{
"docs": [
{
"processor_results": [
{
"processor_type": "script",
"status": "success",
"description": "Extract 'tags' from 'env' field",
"doc": {
"_index": "_index",
"_id": "_id",
"_source": {
"urls": [
"https://apple.com",
"https://google.com"
],
"content": "My name is Sagar patel and i visit https://apple.com and https://google.com"
},
"_ingest": {
"pipeline": "_simulate_pipeline",
"timestamp": "2022-07-13T12:45:00.3655307Z"
}
}
}
]
}
]
}
Upvotes: 1