justcompile
justcompile

Reputation: 3542

Elasticsearch aggregation with hyphenated values splitting into separate values

I'm trying to retrieve an aggregation of tags (with counts) from Elasticsearch, but where I have hyphenated tags, they're getting split returned as separate tags.

E.g.

{
    "tags": ['foo', 'foo-bar', 'cheese']
}

I get back (abridged):

{
  'foo': 8,
  'bar': 3,
  'cheese' : 2
}

When I'm expecting to get:

{
  'foo': 5,
  'foo-bar': 3,
  'cheese' : 2
}

My mapping is:

{
    "asset" : {
        "properties" : {
            "name" : {"type" : "string"},
            "path" : {"type" : "string", "index" : "not_analyzed"},
            "url": {"type" : "string"},
            "tags" : {"type" : "string", "index_name" : "tag"},
            "created": {"type" : "date"},
            "updated": {"type" : "date"},
            "usages": {"type" : "string", "index_name" : "usage"},
            "meta": {"type": "object"}
        }
    }
}

Can anyone point me in the right direction?

Upvotes: 2

Views: 1140

Answers (1)

Andrei Stefan
Andrei Stefan

Reputation: 52368

Try another analyzer, not the standard one which will split the words when certain characters are encountered:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_keyword_lowercase": {
          "tokenizer": "keyword",
          "filter": [
            "lowercase",
            "trim"
          ]
        }
      }
    }
  },
  "mappings": {
    "asset" : {
        "properties" : {
            "name" : {"type" : "string"},
            "path" : {"type" : "string", "index" : "not_analyzed"},
            "url": {"type" : "string"},
            "tags" : {"type" : "string", "index_name" : "tag", "analyzer":"my_keyword_lowercase"},
            "created": {"type" : "date"},
            "updated": {"type" : "date"},
            "usages": {"type" : "string", "index_name" : "usage"},
            "meta": {"type": "object"}
        }
    }
  }
}

Upvotes: 1

Related Questions