Reputation: 21
First of all, consider that I am using a "News" Class (Noticia, in portuguese) that has a string field called "Content" (Conteudo in portuguese)
public class Noticia
{
public string Conteudo { get; set; }
}
I am trying to create an index that is configured to ignore accents and pt-br stopwords as well as to allow up to 40mi chars to be analysed in a highligthed query.
I can create such an index using this code:
var createIndexResponse = client.Indices.Create(indexName, c => c
.Settings(s => s
.Setting("highlight.max_analyzed_offset" , 40000000)
.Analysis(analysis => analysis
.TokenFilters(tokenfilters => tokenfilters
.AsciiFolding("folding-accent", ft => ft
)
.Stop("stoping-br", st => st
.StopWords("_brazilian_")
)
)
.Analyzers(analyzers => analyzers
.Custom("folding-analyzer", cc => cc
.Tokenizer("standard")
.Filters("folding-accent", "stoping-br")
)
)
)
)
.Map<Noticia>(mm => mm
.AutoMap()
.Properties(p => p
.Text(t => t
.Name(n => n.Conteudo)
.Analyzer("folding-analyzer")
)
)
)
);
If I test this analyzer using Kibana Dev Tools, I get the result that I want: No accents and stopwords removed!
POST intranet/_analyze
{
"analyzer": "folding-analyzer",
"text": "Férias de todos os funcionários"
}
Result:
{
"tokens" : [
{
"token" : "Ferias",
"start_offset" : 0,
"end_offset" : 6,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "funcionarios",
"start_offset" : 19,
"end_offset" : 31,
"type" : "<ALPHANUM>",
"position" : 4
}
]
}
The same (good) results are being returned when I use NEST to analyze a query using my folding analyser (Tokens "Ferias" e "funcionarios" are returned)
var analyzeResponse = client.Indices.Analyze(a => a
.Index(indexName)
.Analyzer("folding-analyzer")
.Text("Férias de todos os funcionários")
);
However, If I perform a search using NEST ElasticSearch .NET client, terms like "Férias" (with accent) and "Ferias" (without accent) are beign treated as different.
My goal is to perform a query that returns all results, no matter if the word is Férias or Ferias
Thats the simplified code (C# nest) I am using to query elasticsearch:
var searchResponse = ElasticClient.Search<Noticia>(s => s
.Index(indexName)
.Query(q => q
.MultiMatch(m => m
.Fields(f => f
.Field(p => p.Titulo,4)
.Field(p => p.Conteudo,2)
)
.Query(termo)
)
)
);
and that's the extended API call associated with the searchResponse
Successful (200) low level call on POST: /intranet/_search?pretty=true&error_trace=true&typed_keys=true
# Audit trail of this API call:
- [1] HealthyResponse: Node: ###NODE ADDRESS### Took: 00:00:00.3880295
# Request:
{"query":{"multi_match":{"fields":["categoria^1","titulo^4","ementa^3","conteudo^2","attachments.attachment.content^1"],"query":"Ferias"}},"size":100}
# Response:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 13.788051,
"hits" : [
{
"_index" : "intranet",
"_type" : "_doc",
"_id" : "4934",
"_score" : 13.788051,
"_source" : {
"conteudo" : "blablabla ferias blablabla",
"attachments" : [ ],
"categoria" : "Novidades da Biblioteca - DBD",
"publicadaEm" : "2008-10-14T00:00:00",
"titulo" : "INFORMATIVO DE DIREITO ADMINISTRATIVO E LRF - JUL/2008",
"ementa" : "blablabla",
"matriculaAutor" : 900794,
"atualizadaEm" : "2009-02-03T13:44:00",
"id" : 4934,
"indexacaoAtiva" : true,
"status" : "Disponível"
}
}
]
}
}
I have also tryed to use Multi Fields and Suffix in a query, without success
.Map<Noticia>(mm => mm
.AutoMap()
.Properties(p => p
.Text(t => t
.Name(n => n.Conteudo)
.Analyzer("folding-analyzer")
.Fields(f => f
.Text(ss => ss
.Name("folding")
.Analyzer("folding-analyzer")
)
)
(...)
var searchResponse = ElasticClient.Search<Noticia>(s => s
.Index(indexName)
.Query(q => q
.MultiMatch(m => m
.Fields(f => f
.Field(p => p.Titulo,4)
.Field(p => p.Conteudo.Suffix("folding"),2)
)
.Query(termo)
)
)
);
Any clue what I am doing wrong or what I can do to reach my goal?
Thanks a lot in advance!
Upvotes: 0
Views: 456
Reputation: 21
After a few days I found out what I was doing wrong and it was all about the mapping.
Here are the steps I took to approach the problem and solve it in the end
1 - first of all I`ve opened kibana console and found out that only the last field of my mapped fields was being assigned to my custom analyser (folding-analyser)
To test each one of your fields you can use the GET FIELD MAPPING API and a command in dev tools like this:
GET /<index>/_mapping/field/<field>
then you'll be able to see if your analyser is being assigned to your field or not
2 - After that, I discovered that the last field was the only one being assigned to my custom analyser and the reason was because I was messing up with fluent mapping in two ways:
the correct mapping that worked for me was a bit like this:
.Map<Noticia>(mm => mm
.AutoMap()
.Properties(p => p
.Text(t => t
.Name(n => n.Field1)
.Analyzer("folding-analyzer")
)
.Text(t => t
.Name(n => n.Field2)
.Analyzer("folding-analyzer")
)
.Object<NoticiaArquivo>(o => o
.Name(n => n.Arquivos)
.Properties(eps => eps
.Text(s => s
.Name(e => e.NAField1)
.Analyzer("folding-analyzer")
)
.Text(s => s
.Name(e => e.NAField2)
.Analyzer("folding-analyzer")
)
)
)
)
)
Finally, It's important to share that when you assign an analyser using the .Analyzer("analiserName") clause, you're telling elastic search that you want to use the argument analyser both for indexing and search
If you want to use an analyser only when you search and not on indexing time, you should use the .SearchAnalyzer("analiserName") clause.
Upvotes: 0