Anna
Anna

Reputation: 889

Whitespaces in queries

I have an analyzer which ignores whitespaces. When I search for a string without space, it returns proper results. This is the analyzer:

{
  "index": {
    "number_of_shards": 1,
    "analysis": {
      "filter": {
        "word_joiner": {
          "type": "word_delimiter",
          "catenate_all": true
        }
      },
      "analyzer": {
        "word_join_analyzer": {
          "type": "custom",
          "filter": [
            "word_joiner"
          ],
          "tokenizer": "keyword"
        }
      }
    }
  }
}

This is how it works:

curl -XGET "http://localhost:9200/cake/_analyze?analyzer=word_join_analyzer&pretty" -d 'ONE"\ "TWO'

Result:

{
  "tokens" : [ {
    "token" : "ONE",
    "start_offset" : 1,
    "end_offset" : 5,
    "type" : "word",
    "position" : 0
  }, {
    "token" : "ONETWO",
    "start_offset" : 1,
    "end_offset" : 13,
    "type" : "word",
    "position" : 0
  }, {
    "token" : "TWO",
    "start_offset" : 7,
    "end_offset" : 13,
    "type" : "word",
    "position" : 1
  } ]
}

What I want is that I also get a "token" : "ONE TWO" from this analyzer. How can I do this?
Thanks!

Upvotes: 0

Views: 51

Answers (1)

Val
Val

Reputation: 217274

You need to enable the preserve_original setting, which is false by default

{
  "index": {
    "number_of_shards": 1,
    "analysis": {
      "filter": {
        "word_joiner": {
          "type": "word_delimiter",
          "catenate_all": true,
          "preserve_original": true           <--- add this
        }
      },
      "analyzer": {
        "word_join_analyzer": {
          "type": "custom",
          "filter": [
            "word_joiner"
          ],
          "tokenizer": "keyword"
        }
      }
    }
  }
}

This will yield:

{
  "tokens": [
    {
      "token": "ONE TWO",
      "start_offset": 0,
      "end_offset": 7,
      "type": "word",
      "position": 0
    },
    {
      "token": "ONE",
      "start_offset": 0,
      "end_offset": 3,
      "type": "word",
      "position": 0
    },
    {
      "token": "ONETWO",
      "start_offset": 0,
      "end_offset": 7,
      "type": "word",
      "position": 0
    },
    {
      "token": "TWO",
      "start_offset": 4,
      "end_offset": 7,
      "type": "word",
      "position": 1
    }
  ]
}

Upvotes: 2

Related Questions