Reputation: 1965
I'm trying to use ElasticSearch for partial matches on multiple fields using NGram, but I'm matching 0 results after I build the index. This is not coming very naturally to me, and I can't seem to even get NGram working for even one field. This is a passion project for me, and I really want the new search working for partial word matches. I tried using fuzziness but it started scoring incorrect matches too high.
Index Create:
var nGramFilters = new List<string> { "lowercase", "asciifolding", "nGram_filter" };
Client.Indices.Create(CurrentIndexName, c => c
.Settings(st => st
.Analysis(an => an // https://stackoverflow.com/questions/38065966/token-chars-mapping-to-ngram-filter-elasticsearch-nest
.Analyzers(anz => anz
.Custom("ngram_analyzer", cc => cc
.Tokenizer("ngram_tokenizer")
.Filters(nGramFilters))
)
.Tokenizers(tz => tz
.NGram("ngram_tokenizer", td => td
.MinGram(2)
.MaxGram(20)
.TokenChars(
TokenChar.Letter,
TokenChar.Digit,
TokenChar.Punctuation,
TokenChar.Symbol
)
)
)
)
)
.Map<Package>(map => map
.AutoMap()
.Properties(p => p
.Text(t => t
.Name(n => n.Title)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.Summary)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.PestControlledBy)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.PesticideControlsThesePests)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.PesticideInstructions)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.PesticideActiveIngredients)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.PesticidesContainingThisActiveIngredient)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.PesticideSafeOn)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
.Text(t => t
.Name(n => n.PesticideNotSafeOn)
.Fields(f => f
.Keyword(k => k
.Name("keyword")
.IgnoreAbove(256)
)
.Text(tt => tt
.Name("ngram")
.Analyzer("ngram_analyzer")
)
)
)
)
)
);
Query:
var result = _client.Search<Package>(s => s
.From((form.Page - 1) * form.PageSize)
.Size(form.PageSize)
.Query(query => query
.MultiMatch(m => m
.Fields(f => f
.Field(p => p.Title.Suffix("ngram"), 1.5)
.Field(p => p.Summary.Suffix("ngram"), 1.1)
.Field(p => p.PestControlledBy.Suffix("ngram"), 1.0)
.Field(p => p.PesticideControlsThesePests.Suffix("ngram"), 1.0)
.Field(p => p.PesticideInstructions.Suffix("ngram"), 1.0)
.Field(p => p.PesticideActiveIngredients.Suffix("ngram"), 1.0)
.Field(p => p.PesticidesContainingThisActiveIngredient.Suffix("ngram"), 1.0)
.Field(p => p.PesticideSafeOn.Suffix("ngram"), 1.0)
.Field(p => p.PesticideNotSafeOn.Suffix("ngram"), 1.0)
)
.Operator(Operator.Or) // https://stackoverflow.com/questions/46139028/elasticsearch-how-to-do-a-partial-match-from-your-query
.Query(form.Query)
)
)
.Highlight(h => h
.PreTags("<strong>")
.PostTags("</strong>")
.Encoder(HighlighterEncoder.Html) //https://github.com/elastic/elasticsearch-net/issues/3091
.Fields(fs => fs
.Field(f => f.Summary.Suffix("ngram")),
fs => fs
.Field(p => p.PestControlledBy.Suffix("ngram")),
fs => fs
.Field(p => p.PesticideControlsThesePests.Suffix("ngram")),
fs => fs
.Field(p => p.PesticideInstructions.Suffix("ngram")),
fs => fs
.Field(p => p.PesticideActiveIngredients.Suffix("ngram")),
fs => fs
.Field(p => p.PesticidesContainingThisActiveIngredient.Suffix("ngram")),
fs => fs
.Field(p => p.PesticideSafeOn.Suffix("ngram")),
fs => fs
.Field(p => p.PesticideNotSafeOn.Suffix("ngram"))
.NumberOfFragments(10)
.FragmentSize(250)
)
)
);
Am I even in the right ballpark? I tried using the default analyzer, but I don't match "cat dandelion" for "cat's ear dandelion" and things like that. With the default analyzer... the whole word has to match, but I want partial matches working to get things like "petal" and "petals". Any step in the right direction is appreciated. I'm completely new to ElasticSearch and NEST and have only been working with it for a week or so now.
Upvotes: 1
Views: 1389
Reputation: 9979
client.Indices.Create
call is invalid, there are two reasons for that:
MinGram
and MaxGram
can't be bigger than 1, thus getting this errorElasticsearch.Net.ElasticsearchClientException: Request failed to execute. Call: Status code 400 from: PUT /my_index1?pretty=true&error_trace=true. ServerError: Type: illegal_argument_exception Reason: "The difference between max_gram and min_gram in NGram Tokenizer must be less than or equal to: [1] but was [18]. This limit can be set by changing the [index.max_ngram_diff] index level setting."
You can read more about this error here.
nGram_filter
, you will need to change this one to ngram
I discovered these problems by checking index mapping in elasticsearch (localhost:9200/YOUR_INDEX_NAME/_mapping) where I found that mapping wasn't applied. The second step was to see what DebugInformation
has to tell me from index creation response
var createIndexResponse = await client.Indices.CreateAsync("my_index1", ..);
createIndexResponse.DebugInformation
Hope that helps.
Upvotes: 3