Reputation: 2012
In one of our applications, we mainly work with html encoded text on which we want to search. I could strip the html tags before adding the document to Elastic search (I have a field with the complete text containing the html tags and one without, the stripped version).
I was wondering whether there is a standard analyzer available so that I do not have to strip the html tags "myself", beforehand ...
Hope somebody can be of assistance ...
Upvotes: 1
Views: 1413
Reputation: 17441
The html_strip char filter should help:
example :
curl -XPOST "http://<server>/_analyze?tokenizer=standard&char_filters=html_strip&text='This%20is%20a%20%3Cb%3EDOCUMENT%3C%2Fb%3E%20with%20html'"
Upvotes: 1