Reputation: 15136
I'm using the standard English analyzer on text fields in my ElasticSearch docs.
I'm interested in accessing the list of normalized terms, so if the text is "Set the shape to semi-transparent by calling set_trans(5)"
I want to access the normalized tokens set, shape, semi, transpar, call, set_tran, 5
.
Is that possible?
Upvotes: 2
Views: 259
Reputation: 4733
I would use the termsvector endpoint for this: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html
Upvotes: 2
Reputation: 366
You can use the Analyze API, you can ask any string and get the tokens extracted from it.
Example from the Documentation
curl -XGET 'localhost:9200/_analyze' -d '
{
"tokenizer" : "keyword",
"filters" : ["lowercase"],
"text" : "this is a test"
}'
curl -XGET 'localhost:9200/_analyze' -d '
{
"tokenizer" : "keyword",
"token_filters" : ["lowercase"],
"char_filters" : ["html_strip"],
"text" : "this is a <b>test</b>"
}'
Upvotes: 2