Junio Branco
Junio Branco

Reputation: 38

Query Elastic Search where word indexed has space

I recently tried to use Elastic search. However, I am struggling to query for the following scenario: I have my index set up with this:

"analysis": {
    "index_analyzer": {
        "my_index_analyzer": {
            "type": "custom",
            "tokenizer": "standard",
            "filter": ["standard", "lowercase", "nGram"],
            "char-filter": ["my_pattern"]
        }
    },
    "search_analyzer": {
        "my_search_analyzer": {
            "type": "custom",
            "tokenizer": "standard",
            "filter": ["standard", "lowercase", "nGram"],
            "char-filter": ["my_pattern"]
        }
    },
    "filter": {
        "nGram": {
            "type": "nGram",
            "min_gram": 3,
            "max_gram": 40
        }
    },
    "char_filter" : {
        "my_pattern":{
            "type":"pattern_replace",
            "pattern":"\u0020",
            "replacement":""
        }
    }

And the documents that are indexed are :

{
   name:'My self'
},
{
   name:'Hell o'
}

If I search for Myself, I am expecting it to return the first JSON object, however this is not happening..

I am searching using this (where term is just the string being searched):

var query = {
            match: {
                location: term

            }
        };
client.search({
            index: 'requests',
            analyzer:'my_search_analyzer',
            body: {
                query:query
            }
         })

I would really appreciate some guidance on this!

Kind Regards JB

Upvotes: 0

Views: 200

Answers (1)

Val
Val

Reputation: 217274

You are almost there, your index definition only has some small issues and typos which we will fix:

  1. You don't need index_analyzer and search_analyzer simply define my_index_analyzer and my_search_analyzer directly under the analyzer element.
  2. char-filter should read char_filter (with underscore)
  3. your space pattern needs an additional backslash

This is the corrected settings/mappings that I used:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_index_analyzer": {         <--- 1. directly under analyzer
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "standard",
            "lowercase",
            "nGram"
          ],
          "char_filter": [             <--- 2. underscore
            "my_pattern"
          ]
        },
        "my_search_analyzer": {        <--- 1. directly under analyzer
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "standard",
            "lowercase",
            "nGram"
          ],
          "char_filter": [             <--- 2. underscore
            "my_pattern"
          ]
        }
      },
      "filter": {
        "nGram": {
          "type": "nGram",
          "min_gram": 3,
          "max_gram": 40
        }
      },
      "char_filter": {
        "my_pattern": {
          "type": "pattern_replace",
          "pattern": "\\u0020",        <--- 3. additional backslash
          "replacement": ""
        }
      }
    }
  },
  "mappings": {
    "request": {
      "properties": {
        "location": {
          "type": "string",
          "index_analyzer": "my_index_analyzer"
        }
      }
    }
  }
}

Then you can index your two sample documents:

curl -XPUT localhost:9200/requests/request/1 -d '{"location":"My self"}'
curl -XPUT localhost:9200/requests/request/2 -d '{"location":"Hell o"}'

And you'll get what you expect:

curl -XPOST localhost:9200/requests/request/_search -d '{
  "query": {
    "match": {
      "location": "Myself"
    }
  }
}'

will return the document with My self

Upvotes: 2

Related Questions