Reputation: 3542
I'm trying to retrieve an aggregation of tags (with counts) from Elasticsearch, but where I have hyphenated tags, they're getting split returned as separate tags.
E.g.
{
"tags": ['foo', 'foo-bar', 'cheese']
}
I get back (abridged):
{
'foo': 8,
'bar': 3,
'cheese' : 2
}
When I'm expecting to get:
{
'foo': 5,
'foo-bar': 3,
'cheese' : 2
}
My mapping is:
{
"asset" : {
"properties" : {
"name" : {"type" : "string"},
"path" : {"type" : "string", "index" : "not_analyzed"},
"url": {"type" : "string"},
"tags" : {"type" : "string", "index_name" : "tag"},
"created": {"type" : "date"},
"updated": {"type" : "date"},
"usages": {"type" : "string", "index_name" : "usage"},
"meta": {"type": "object"}
}
}
}
Can anyone point me in the right direction?
Upvotes: 2
Views: 1140
Reputation: 52368
Try another analyzer, not the standard one which will split the words when certain characters are encountered:
{
"settings": {
"analysis": {
"analyzer": {
"my_keyword_lowercase": {
"tokenizer": "keyword",
"filter": [
"lowercase",
"trim"
]
}
}
}
},
"mappings": {
"asset" : {
"properties" : {
"name" : {"type" : "string"},
"path" : {"type" : "string", "index" : "not_analyzed"},
"url": {"type" : "string"},
"tags" : {"type" : "string", "index_name" : "tag", "analyzer":"my_keyword_lowercase"},
"created": {"type" : "date"},
"updated": {"type" : "date"},
"usages": {"type" : "string", "index_name" : "usage"},
"meta": {"type": "object"}
}
}
}
}
Upvotes: 1