Hernantz
Hernantz

Reputation: 566

Elasticsearch custom analyzer being ignored

I'm using Elasticsearch 2.2.0 and I'm trying to use the lowercase + asciifolding filters on a field.

This is the output of http://localhost:9200/myindex/

{
    "myindex": {
        "aliases": {}, 
        "mappings": {
            "products": {
                "properties": {
                    "fold": {
                        "analyzer": "folding", 
                        "type": "string"
                    }
                }
            }
        }, 
        "settings": {
            "index": {
                "analysis": {
                    "analyzer": {
                        "folding": {
                            "token_filters": [
                                "lowercase", 
                                "asciifolding"
                            ], 
                            "tokenizer": "standard", 
                            "type": "custom"
                        }
                    }
                }, 
                "creation_date": "1456180612715", 
                "number_of_replicas": "1", 
                "number_of_shards": "5", 
                "uuid": "vBMZEasPSAyucXICur3GVA", 
                "version": {
                    "created": "2020099"
                }
            }
        }, 
        "warmers": {}
    }
}

And when I try to test the folding custom filter using the _analyze API, this is what I get as an output of http://localhost:9200/myindex/_analyze?analyzer=folding&text=%C3%89sta%20est%C3%A1%20loca

{
    "tokens": [
        {
            "end_offset": 4, 
            "position": 0, 
            "start_offset": 0, 
            "token": "Ésta", 
            "type": "<ALPHANUM>"
        }, 
        {
            "end_offset": 9, 
            "position": 1, 
            "start_offset": 5, 
            "token": "está", 
            "type": "<ALPHANUM>"
        }, 
        {
            "end_offset": 14, 
            "position": 2, 
            "start_offset": 10, 
            "token": "loca", 
            "type": "<ALPHANUM>"
        }
    ]
}

As you can see, the returned tokens are: Ésta, está, loca instead of esta, esta, loca. What's going on? it seems that this folding analyzer is being ignored.

Upvotes: 1

Views: 196

Answers (1)

IanGabes
IanGabes

Reputation: 2797

Looks like a simple typo when you are creating your index.

In your "analysis":{"analyzer":{...}} block, this:

"token_filters": [...]

Should be

"filter": [...]

Check the documentation for confirmation of this. Because your filter array wasn't named correctly, ES completely ignored it, and just decided to use the standard analyzer. Here is a small example written using the Sense chrome plugin. Execute them in order:

DELETE /test

PUT /test
{
      "analysis": {
         "analyzer": {
            "folding": {
               "type": "custom",
               "filter": [
                  "lowercase",
                  "asciifolding"
               ],
               "tokenizer": "standard"
            }
         }
      }
}

GET /test/_analyze
{
    "analyzer":"folding",
    "text":"Ésta está loca"
}

And the results of the last GET /test/_analyze:

"tokens": [
      {
         "token": "esta",
         "start_offset": 0,
         "end_offset": 4,
         "type": "<ALPHANUM>",
         "position": 0
      },
      {
         "token": "esta",
         "start_offset": 5,
         "end_offset": 9,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "loca",
         "start_offset": 10,
         "end_offset": 14,
         "type": "<ALPHANUM>",
         "position": 2
      }
   ]

Upvotes: 1

Related Questions