Reputation: 6567
I have a string I'd like to index as keyword type but with a special comma analyzer: For example:
"San Francisco, Boston, New York" -> "San Francisco", "Boston, "New York"
should be both indexed and aggregatable at the same time so that I can split it up by buckets. In pre 5.0.0 the following worked: Index settings:
{
'settings': {
'analysis': {
'tokenizer': {
'comma': {
'type': 'pattern',
'pattern': ','
}
},
'analyzer': {
'comma': {
'type': 'custom',
'tokenizer': 'comma'
}
}
},
},
}
with the following mapping:
{
'city': {
'type': 'string',
'analyzer': 'comma'
},
}
Now in 5.3.0 and above the analyzer is no longer a valid property for the keyword type, and my understanding is that I want a keyword type here. How do I specify an aggregatable, indexed, searchable text type with custom analyzer?
Upvotes: 0
Views: 1005
Reputation: 217394
Since you're using ES 5.3, I suggest a different approach, using an ingest pipeline to split your field at indexing time.
PUT _ingest/pipeline/city-splitter
{
"description": "City splitter",
"processors": [
{
"split": {
"field": "city",
"separator": ","
}
},
{
"foreach": {
"field": "city",
"processor": {
"trim": {
"field": "_ingest._value"
}
}
}
}
]
}
Then you can index a new document:
PUT cities/city/1?pipeline=city-splitter
{ "city" : "San Francisco, Boston, New York" }
And finally you can search/sort on city
and run an aggregation on the field city.keyword
as if the cities had been split in your client application:
POST cities/_search
{
"query": {
"match": {
"city": "boston"
}
},
"aggs": {
"cities": {
"terms": {
"field": "city.keyword"
}
}
}
}
Upvotes: 1
Reputation: 4803
You can use multifields to index the same fields in two different ways one for searching and other for aggregations.
Also i suugest you to add a filter for trim and lowercase the tokens produced to help you with better search.
Mappings
PUT commaindex2
{
"settings": {
"analysis": {
"tokenizer": {
"comma": {
"type": "pattern",
"pattern": ","
}
},
"analyzer": {
"comma": {
"type": "custom",
"tokenizer": "comma",
"filter": ["lowercase", "trim"]
}
}
}
},
"mappings": {
"city_document": {
"properties": {
"city": {
"type": "keyword",
"fields": {
"city_custom_analyzed": {
"type": "text",
"analyzer": "comma",
"fielddata": true
}
}
}
}
}
}
}
Index Document
POST commaindex2/city_document
{
"city" : "san fransisco, new york, london"
}
Search Query
POST commaindex2/city_document/_search
{
"query": {
"bool": {
"must": [{
"term": {
"city.city_custom_analyzed": {
"value": "new york"
}
}
}]
}
},
"aggs": {
"terms_agg": {
"terms": {
"field": "city",
"size": 10
}
}
}
}
Note
In case you want to run aggs on indexed fields, like you want to count for each city in buckets, you can run terms aggregation on city.city_custom_analyzed
field.
POST commaindex2/city_document/_search
{
"query": {
"bool": {
"must": [{
"term": {
"city.city_custom_analyzed": {
"value": "new york"
}
}
}]
}
},
"aggs": {
"terms_agg": {
"terms": {
"field": "city.city_custom_analyzed",
"size": 10
}
}
}
}
Hope this helps
Upvotes: 2