Oleksiy
Oleksiy

Reputation: 6567

Create keyword string type with custom analyzer in 5.3.0

I have a string I'd like to index as keyword type but with a special comma analyzer: For example:

"San Francisco, Boston, New York" -> "San Francisco", "Boston, "New York"

should be both indexed and aggregatable at the same time so that I can split it up by buckets. In pre 5.0.0 the following worked: Index settings:

{
     'settings': {
         'analysis': {
             'tokenizer': {
                 'comma': {
                     'type': 'pattern',
                     'pattern': ','
                 }
             },
             'analyzer': {
                'comma': {
                     'type': 'custom',
                     'tokenizer': 'comma'
                 }
             }
         },
     },
}

with the following mapping:

{
    'city': {
        'type': 'string',
        'analyzer': 'comma'
    },
}

Now in 5.3.0 and above the analyzer is no longer a valid property for the keyword type, and my understanding is that I want a keyword type here. How do I specify an aggregatable, indexed, searchable text type with custom analyzer?

Upvotes: 0

Views: 1005

Answers (2)

Val
Val

Reputation: 217394

Since you're using ES 5.3, I suggest a different approach, using an ingest pipeline to split your field at indexing time.

PUT _ingest/pipeline/city-splitter
{
  "description": "City splitter",
  "processors": [
    {
      "split": {
        "field": "city",
        "separator": ","
      }
    },
    {
      "foreach": {
        "field": "city",
        "processor": {
          "trim": {
            "field": "_ingest._value"
          }
        }
      }
    }
  ]
}

Then you can index a new document:

PUT cities/city/1?pipeline=city-splitter
{ "city" : "San Francisco, Boston, New York" }

And finally you can search/sort on city and run an aggregation on the field city.keyword as if the cities had been split in your client application:

POST cities/_search
{
  "query": {
     "match": {
         "city": "boston"
     }
  },
  "aggs": {
    "cities": {
      "terms": {
        "field": "city.keyword"
      }
    }
  }
}

Upvotes: 1

user3775217
user3775217

Reputation: 4803

You can use multifields to index the same fields in two different ways one for searching and other for aggregations.

Also i suugest you to add a filter for trim and lowercase the tokens produced to help you with better search.

Mappings

PUT commaindex2
    {
        "settings": {
            "analysis": {
                "tokenizer": {
                    "comma": {
                        "type": "pattern",
                        "pattern": ","
                    }
                },
                "analyzer": {
                    "comma": {
                        "type": "custom",
                        "tokenizer": "comma",
                        "filter": ["lowercase", "trim"]
                    }
                }
            }
        },
        "mappings": {
            "city_document": {
                "properties": {
                    "city": {
                        "type": "keyword",
                        "fields": {
                            "city_custom_analyzed": {
                                "type": "text",
                                "analyzer": "comma",
                                "fielddata": true
                            }
                        }
                    }
                }
            }
        }
    }

Index Document

POST commaindex2/city_document
{
  "city" : "san fransisco, new york, london"
}

Search Query

POST commaindex2/city_document/_search
{
    "query": {
        "bool": {
            "must": [{
                "term": {
                    "city.city_custom_analyzed": {
                        "value": "new york"
                    }
                }
            }]
        }
    },
    "aggs": {
        "terms_agg": {
            "terms": {
                "field": "city",
                "size": 10
            }
        }
    }
}

Note

In case you want to run aggs on indexed fields, like you want to count for each city in buckets, you can run terms aggregation on city.city_custom_analyzed field.

POST commaindex2/city_document/_search
{
    "query": {
        "bool": {
            "must": [{
                "term": {
                    "city.city_custom_analyzed": {
                        "value": "new york"
                    }
                }
            }]
        }
    },
    "aggs": {
        "terms_agg": {
            "terms": {
                "field": "city.city_custom_analyzed",
                "size": 10
            }
        }
    }
}

Hope this helps

Upvotes: 2

Related Questions