justiceorjustus
justiceorjustus

Reputation: 1965

.NET ElasticSearch NEST - NGram Analyzer on Multiple Fields for Partial Matches

I'm trying to use ElasticSearch for partial matches on multiple fields using NGram, but I'm matching 0 results after I build the index. This is not coming very naturally to me, and I can't seem to even get NGram working for even one field. This is a passion project for me, and I really want the new search working for partial word matches. I tried using fuzziness but it started scoring incorrect matches too high.

Index Create:

var nGramFilters = new List<string> { "lowercase", "asciifolding", "nGram_filter" };

Client.Indices.Create(CurrentIndexName, c => c
    .Settings(st => st
            .Analysis(an => an // https://stackoverflow.com/questions/38065966/token-chars-mapping-to-ngram-filter-elasticsearch-nest
                .Analyzers(anz => anz
                    .Custom("ngram_analyzer", cc => cc
                        .Tokenizer("ngram_tokenizer")
                            .Filters(nGramFilters))
                        )
                        .Tokenizers(tz => tz
                                .NGram("ngram_tokenizer", td => td
                                    .MinGram(2)
                                        .MaxGram(20)
                                        .TokenChars(
                                            TokenChar.Letter,
                                            TokenChar.Digit,
                                            TokenChar.Punctuation,
                                            TokenChar.Symbol
                                        )
                                    )
                                )
                            )
                        )
                        .Map<Package>(map => map
                            .AutoMap()
                            .Properties(p => p
                            .Text(t => t
                                .Name(n => n.Title)
                                .Fields(f => f
                                    .Keyword(k => k
                                        .Name("keyword")
                                            .IgnoreAbove(256)
                                    )
                                    .Text(tt => tt
                                        .Name("ngram")
                                        .Analyzer("ngram_analyzer")
                                    )
                                )
                            )
                            .Text(t => t
                                .Name(n => n.Summary)
                                .Fields(f => f
                                    .Keyword(k => k
                                        .Name("keyword")
                                            .IgnoreAbove(256)
                                    )
                                    .Text(tt => tt
                                        .Name("ngram")
                                        .Analyzer("ngram_analyzer")
                                    )
                                )
                            )
                            .Text(t => t
                                .Name(n => n.PestControlledBy)
                                .Fields(f => f
                                    .Keyword(k => k
                                        .Name("keyword")
                                            .IgnoreAbove(256)
                                    )
                                    .Text(tt => tt
                                        .Name("ngram")
                                        .Analyzer("ngram_analyzer")
                                    )
                                )
                            )
                            .Text(t => t
                                .Name(n => n.PesticideControlsThesePests)
                                .Fields(f => f
                                    .Keyword(k => k
                                        .Name("keyword")
                                            .IgnoreAbove(256)
                                    )
                                    .Text(tt => tt
                                        .Name("ngram")
                                        .Analyzer("ngram_analyzer")
                                    )
                                )
                            )
                            .Text(t => t
                                .Name(n => n.PesticideInstructions)
                                .Fields(f => f
                                    .Keyword(k => k
                                        .Name("keyword")
                                            .IgnoreAbove(256)
                                    )
                                    .Text(tt => tt
                                        .Name("ngram")
                                        .Analyzer("ngram_analyzer")
                                    )
                                )
                            )
                            .Text(t => t
                                .Name(n => n.PesticideActiveIngredients)
                                .Fields(f => f
                                    .Keyword(k => k
                                        .Name("keyword")
                                            .IgnoreAbove(256)
                                    )
                                    .Text(tt => tt
                                        .Name("ngram")
                                        .Analyzer("ngram_analyzer")
                                    )
                                )
                            )
                            .Text(t => t
                                .Name(n => n.PesticidesContainingThisActiveIngredient)
                                .Fields(f => f
                                    .Keyword(k => k
                                        .Name("keyword")
                                            .IgnoreAbove(256)
                                    )
                                    .Text(tt => tt
                                        .Name("ngram")
                                        .Analyzer("ngram_analyzer")
                                    )
                                )
                            )
                            .Text(t => t
                                .Name(n => n.PesticideSafeOn)
                                .Fields(f => f
                                    .Keyword(k => k
                                        .Name("keyword")
                                            .IgnoreAbove(256)
                                    )
                                    .Text(tt => tt
                                        .Name("ngram")
                                        .Analyzer("ngram_analyzer")
                                    )
                                )
                            )
                            .Text(t => t
                                .Name(n => n.PesticideNotSafeOn)
                                .Fields(f => f
                                    .Keyword(k => k
                                        .Name("keyword")
                                            .IgnoreAbove(256)
                                    )
                                    .Text(tt => tt
                                        .Name("ngram")
                                        .Analyzer("ngram_analyzer")
                                    )
                                )
                            )
                        )
                    )
                );

Query:

var result = _client.Search<Package>(s => s
.From((form.Page - 1) * form.PageSize)
.Size(form.PageSize)
.Query(query => query
    .MultiMatch(m => m
        .Fields(f => f
            .Field(p => p.Title.Suffix("ngram"), 1.5)
            .Field(p => p.Summary.Suffix("ngram"), 1.1)
            .Field(p => p.PestControlledBy.Suffix("ngram"), 1.0)
            .Field(p => p.PesticideControlsThesePests.Suffix("ngram"), 1.0)
            .Field(p => p.PesticideInstructions.Suffix("ngram"), 1.0)
            .Field(p => p.PesticideActiveIngredients.Suffix("ngram"), 1.0)
            .Field(p => p.PesticidesContainingThisActiveIngredient.Suffix("ngram"), 1.0)
            .Field(p => p.PesticideSafeOn.Suffix("ngram"), 1.0)
            .Field(p => p.PesticideNotSafeOn.Suffix("ngram"), 1.0)
        )
        .Operator(Operator.Or) // https://stackoverflow.com/questions/46139028/elasticsearch-how-to-do-a-partial-match-from-your-query
        .Query(form.Query)
    )
)
.Highlight(h => h
    .PreTags("<strong>")
    .PostTags("</strong>")
    .Encoder(HighlighterEncoder.Html) //https://github.com/elastic/elasticsearch-net/issues/3091
    .Fields(fs => fs
        .Field(f => f.Summary.Suffix("ngram")),
        fs => fs
        .Field(p => p.PestControlledBy.Suffix("ngram")),
        fs => fs
        .Field(p => p.PesticideControlsThesePests.Suffix("ngram")),
        fs => fs
        .Field(p => p.PesticideInstructions.Suffix("ngram")),
        fs => fs
        .Field(p => p.PesticideActiveIngredients.Suffix("ngram")),
        fs => fs
        .Field(p => p.PesticidesContainingThisActiveIngredient.Suffix("ngram")),
        fs => fs
        .Field(p => p.PesticideSafeOn.Suffix("ngram")),
        fs => fs
        .Field(p => p.PesticideNotSafeOn.Suffix("ngram"))
        .NumberOfFragments(10)
        .FragmentSize(250)
        )
    )
);

Am I even in the right ballpark? I tried using the default analyzer, but I don't match "cat dandelion" for "cat's ear dandelion" and things like that. With the default analyzer... the whole word has to match, but I want partial matches working to get things like "petal" and "petals". Any step in the right direction is appreciated. I'm completely new to ElasticSearch and NEST and have only been working with it for a week or so now.

Upvotes: 1

Views: 1389

Answers (1)

Rob
Rob

Reputation: 9979

client.Indices.Create call is invalid, there are two reasons for that:

  1. Difference between MinGram and MaxGram can't be bigger than 1, thus getting this error
Elasticsearch.Net.ElasticsearchClientException: Request failed to execute. Call: Status code 400 from: PUT /my_index1?pretty=true&error_trace=true. ServerError: Type: illegal_argument_exception Reason: "The difference between max_gram and min_gram in NGram Tokenizer must be less than or equal to: [1] but was [18]. This limit can be set by changing the [index.max_ngram_diff] index level setting."

You can read more about this error here.

  1. There is no such filter like nGram_filter, you will need to change this one to ngram

I discovered these problems by checking index mapping in elasticsearch (localhost:9200/YOUR_INDEX_NAME/_mapping) where I found that mapping wasn't applied. The second step was to see what DebugInformation has to tell me from index creation response

var createIndexResponse = await client.Indices.CreateAsync("my_index1", ..);
createIndexResponse.DebugInformation

Hope that helps.

Upvotes: 3

Related Questions