eran
eran

Reputation: 15136

Accessing the normalized tokens of an ElasticSearch document

I'm using the standard English analyzer on text fields in my ElasticSearch docs.

I'm interested in accessing the list of normalized terms, so if the text is "Set the shape to semi-transparent by calling set_trans(5)" I want to access the normalized tokens set, shape, semi, transpar, call, set_tran, 5.

Is that possible?

Upvotes: 2

Views: 259

Answers (2)

Jettro Coenradie
Jettro Coenradie

Reputation: 4733

I would use the termsvector endpoint for this: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html

Upvotes: 2

Hkntn
Hkntn

Reputation: 366

You can use the Analyze API, you can ask any string and get the tokens extracted from it.
Example from the Documentation

curl -XGET 'localhost:9200/_analyze' -d '
{
  "tokenizer" : "keyword",
  "filters" : ["lowercase"],
  "text" : "this is a test"
}'

curl -XGET 'localhost:9200/_analyze' -d '
{
  "tokenizer" : "keyword",
  "token_filters" : ["lowercase"],
  "char_filters" : ["html_strip"],
  "text" : "this is a <b>test</b>"
}'

Upvotes: 2

Related Questions