Create keyword string type with custom analyzer in 5.3.0

Question

I have a string I'd like to index as keyword type but with a special comma analyzer: For example:

"San Francisco, Boston, New York" -> "San Francisco", "Boston, "New York"

should be both indexed and aggregatable at the same time so that I can split it up by buckets. In pre 5.0.0 the following worked: Index settings:

{
     'settings': {
         'analysis': {
             'tokenizer': {
                 'comma': {
                     'type': 'pattern',
                     'pattern': ','
                 }
             },
             'analyzer': {
                'comma': {
                     'type': 'custom',
                     'tokenizer': 'comma'
                 }
             }
         },
     },
}

with the following mapping:

{
    'city': {
        'type': 'string',
        'analyzer': 'comma'
    },
}

Now in 5.3.0 and above the analyzer is no longer a valid property for the keyword type, and my understanding is that I want a keyword type here. How do I specify an aggregatable, indexed, searchable text type with custom analyzer?

user3775217 · Accepted Answer

You can use multifields to index the same fields in two different ways one for searching and other for aggregations.

Also i suugest you to add a filter for trim and lowercase the tokens produced to help you with better search.

Mappings

PUT commaindex2
    {
        "settings": {
            "analysis": {
                "tokenizer": {
                    "comma": {
                        "type": "pattern",
                        "pattern": ","
                    }
                },
                "analyzer": {
                    "comma": {
                        "type": "custom",
                        "tokenizer": "comma",
                        "filter": ["lowercase", "trim"]
                    }
                }
            }
        },
        "mappings": {
            "city_document": {
                "properties": {
                    "city": {
                        "type": "keyword",
                        "fields": {
                            "city_custom_analyzed": {
                                "type": "text",
                                "analyzer": "comma",
                                "fielddata": true
                            }
                        }
                    }
                }
            }
        }
    }

Index Document

POST commaindex2/city_document
{
  "city" : "san fransisco, new york, london"
}

Search Query

POST commaindex2/city_document/_search
{
    "query": {
        "bool": {
            "must": [{
                "term": {
                    "city.city_custom_analyzed": {
                        "value": "new york"
                    }
                }
            }]
        }
    },
    "aggs": {
        "terms_agg": {
            "terms": {
                "field": "city",
                "size": 10
            }
        }
    }
}

Note

In case you want to run aggs on indexed fields, like you want to count for each city in buckets, you can run terms aggregation on city.city_custom_analyzed field.

POST commaindex2/city_document/_search
{
    "query": {
        "bool": {
            "must": [{
                "term": {
                    "city.city_custom_analyzed": {
                        "value": "new york"
                    }
                }
            }]
        }
    },
    "aggs": {
        "terms_agg": {
            "terms": {
                "field": "city.city_custom_analyzed",
                "size": 10
            }
        }
    }
}

Hope this helps

Create keyword string type with custom analyzer in 5.3.0

Answers (2)

Related Questions