Reputation: 949
I'm trying to get user submitted queries for "Joe Frankles", "Joe Frankle", "Joe Frankle's" to match the original text "Joe Frankle's". Right now we're indexing the field this text is in with (Tire / Ruby Format):
{ :type => 'string', :analyzer => 'snowball' }
and searching with:
query { string downcased_query, :default_operator => 'AND' }
I tried this unsuccessfully:
create :settings => {
:analysis => {
:char_filter => {
:remove_accents => {
:type => "mapping",
:mappings => ["`=>", "'=>"]
}
},
:analyzer => {
:myanalyzer => {
:type => 'custom',
:tokenizer => 'standard',
:char_filter => ['remove_accents'],
:filter => ['standard', 'lowercase', 'stop', 'snowball', 'ngram']
}
},
:default => {
:type => 'myanalyzer'
}
}
},
Upvotes: 4
Views: 4087
Reputation: 6825
There's two official ways of handling possessive apostrophes:
1) Use the "possessive_english" stemmer as described in the ES docs: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-stemmer-tokenfilter.html
Example:
{
"index" : {
"analysis" : {
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase", "my_stemmer"]
}
},
"filter" : {
"my_stemmer" : {
"type" : "stemmer",
"name" : "possessive_english"
}
}
}
}
}
Use other stemmers or snowball in addition to the "possessive_english" filter if you like. Should/Must work, but it's untested code.
2) Use the "word_delimiter" filter:
{
"index" : {
"analysis" : {
"analyzer" : {
"my_analyzer" : {
"tokenizer" : "standard",
"filter" : ["standard", "lowercase", "my_word_delimiter"]
}
},
"filter" : {
"my_word_delimiter" : {
"type" : "word_delimiter",
"preserve_original": "true"
}
}
}
}
}
Works for me :-) ES docs: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-word-delimiter-tokenfilter.html
Both will cut off "'s".
Upvotes: 4
Reputation: 2105
I ran into a similar problem, the snowball analyzer alone didn't work for me. Don't know if it's supposed to or not. Here's what I use:
properties: {
name: {
boost: 10,
type: 'multi_field',
fields: {
name: { type: 'string', index: 'analyzed', analyzer: 'title_analyzer' },
untouched: { type: 'string', index: 'not_analyzed' }
}
}
}
analysis: {
char_filter: {
remove_accents: {
type: "mapping",
mappings: ["`=>", "'=>"]
}
},
filter: {},
analyzer: {
title_analyzer: {
type: 'custom',
tokenizer: 'standard',
char_filter: ['remove_accents'],
}
}
}
The Admin indices analyze tool is also great when working with analyzers.
Upvotes: 1
Reputation: 30163
It looks like in your query you are searching _all
field, but your analyzer is applied only to the individual field. To enable this functionality for the _all
field, simply make snowball your default analyzer.
Upvotes: 0