LMH
LMH

Reputation: 949

Ignoring Apostrophes (Possessive) In ElasticSearch

I'm trying to get user submitted queries for "Joe Frankles", "Joe Frankle", "Joe Frankle's" to match the original text "Joe Frankle's". Right now we're indexing the field this text is in with (Tire / Ruby Format):

{ :type => 'string', :analyzer => 'snowball' }

and searching with:

query { string downcased_query, :default_operator => 'AND' }

I tried this unsuccessfully:

          create :settings => {
              :analysis => {
                :char_filter => {
                   :remove_accents => {
                     :type => "mapping",
                     :mappings => ["`=>", "'=>"]
                   }
                },
                :analyzer => {
                  :myanalyzer => {
                    :type => 'custom',
                    :tokenizer => 'standard',
                    :char_filter => ['remove_accents'],
                    :filter => ['standard', 'lowercase', 'stop', 'snowball', 'ngram']
                  }
                },
                :default => {
                  :type => 'myanalyzer'
                }
            }
          },

Upvotes: 4

Views: 4087

Answers (3)

Simon Steinberger
Simon Steinberger

Reputation: 6825

There's two official ways of handling possessive apostrophes:

1) Use the "possessive_english" stemmer as described in the ES docs: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html

Example:

{
  "index" : {
    "analysis" : {
        "analyzer" : {
            "my_analyzer" : {
                "tokenizer" : "standard",
                "filter" : ["standard", "lowercase", "my_stemmer"]
            }
        },
        "filter" : {
            "my_stemmer" : {
                "type" : "stemmer",
                "name" : "possessive_english"
            }
        }
    }
  }
}

Use other stemmers or snowball in addition to the "possessive_english" filter if you like. Should/Must work, but it's untested code.

2) Use the "word_delimiter" filter:

{
  "index" : {
    "analysis" : {
        "analyzer" : {
            "my_analyzer" : {
                "tokenizer" : "standard",
                "filter" : ["standard", "lowercase", "my_word_delimiter"]
            }
        },
        "filter" : {
            "my_word_delimiter" : {
                "type" : "word_delimiter",
                "preserve_original": "true"
            }
        }
    }
  }
}

Works for me :-) ES docs: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html

Both will cut off "'s".

Upvotes: 4

Yeggeps
Yeggeps

Reputation: 2105

I ran into a similar problem, the snowball analyzer alone didn't work for me. Don't know if it's supposed to or not. Here's what I use:

properties: {
  name: {
    boost: 10,
    type:  'multi_field',
    fields: {
      name:      { type: 'string', index: 'analyzed', analyzer: 'title_analyzer' },
      untouched: { type: 'string', index: 'not_analyzed' }
    }
  }
}

analysis: {
  char_filter: {
    remove_accents: {
      type: "mapping",
      mappings: ["`=>", "'=>"]
    }
  },
  filter: {},
  analyzer: {
    title_analyzer: {
      type: 'custom',
      tokenizer: 'standard',
      char_filter: ['remove_accents'],
    }
  }
}

The Admin indices analyze tool is also great when working with analyzers.

Upvotes: 1

imotov
imotov

Reputation: 30163

It looks like in your query you are searching _all field, but your analyzer is applied only to the individual field. To enable this functionality for the _all field, simply make snowball your default analyzer.

Upvotes: 0

Related Questions