What is Analyzer in Elasticsearch for?

Question

I am having some issues understanding elastic search analyzer. What is it for and how to use it?

From this article, there is a tokenizer and token filter from a source text. Am I not able to understand the source text is from the URL or from the text inside the indexes? From the article, it says to execute "GET

http://localhost:9200/_analyze?text=I%20sing%20he%20sings%20they%20are%20singing&analyzer=snowball"

which is from the URL, but does this analyzer related to the search the text inside my indexes?

I am quite confused and sorry if my question sounds stupid.

TechnocratSid · Accepted Answer

Analyzer is a wrapper which wraps three functions:

Character filter: Mainly used to strip off some unused characters or change some characters.
Tokenizer: Breaks a text into individual tokens(or words) and it does that based on certain factors(whitespace, ngram etc).
Token filter: It receives the tokens and then apply some filters(example changing uppercase terms to lowercase).

In a nutshell an analyzer is used to tell elasticsearch how the text should be indexed and searched.

And what you're looking into is the Analyze API, which is a very nice tool to understand how analyzers work. The text is provided to this API and is not related to the index.

In your case the GET request:

GET http://localhost:9200/_analyze?text=I%20sing%20he%20sings%20they%20are%20singing&analyzer=snowball

is equivalent to:

GET _analyze
{
  "analyzer" : "snowball",
  "text" : "I sing he sings they are singing"
}

which outputs:

{
  "tokens": [
    {"token": "i", "position": 1, ...},
    {"token": "sing", "position": 2, ...},
    {"token": "he", "position": 3, ...},
    {"token": "sing", "position": 4, ...},
    {"token": "sing", "position": 7, ...},
  ]
}

as mentioned in the article.

One more thing, let's say if you have defined a custom analyzer in your index that does a combination of character filtering, tokenizing and token filtering in your own way and you want to check how it will tokenize the text, then you can use the _analyze end point with your index name and even in that case you have to provide the text.

GET my_index/_analyze
{
  "analyzer" : "custom",
  "text" : "I sing he sings they are singing" --> You have to provide the text. 
}

Why analyzers?

Analyzers are generally used when you want to index a text or phrase, it is useful to break the text into words so that you can search on terms to get the document.

Example: Let's say you have an index (my_index) and in that index you have a text field (intro) and you index a document where "intro":"Hi there I am sid" and if you're not using analyzer then this will be stored as "Hi there I am sid". If you want to query for this document you will have to write the complete phrase (find documents where intro = "Hi there I am sid"). But if this phrase is indexed as tokens then even if you query for a token (find documents where intro="sid") you'll get the document.

Note: By default standard analyzer is used for all text fields.

Hope it helps !

What is Analyzer in Elasticsearch for?

Answers (2)

Related Questions