Reputation: 2190
I´m trying to make a query in ElasticSearch with the NEST c# client a query without accent, my data has portuguese latin word with accent. See the code bellow:
var result = client.Search<Book>(s => s
.From(0)
.Size(20)
.Fields(f => f.Title)
.FacetTerm(f => f.OnField(of => of.Genre))
.Query(q => q.QueryString(qs => qs.Query("sao")))
);
This search did not find anything. My data on this index contains many titles like: "São Cristóvan", "São Gonçalo".
var settings = new IndexSettings();
settings.NumberOfReplicas = 1;
settings.NumberOfShards = 5;
settings.Analysis.Analyzers.Add("snowball", new Nest.SnowballAnalyzer { Language = "Portuguese" });
var idx5 = client.CreateIndex("idx5", settings);
How I can make query "sao" and find "são" using ElasticSearch?
I think have to create index with right properties, but I already tried many settings like.
or in Raw Mode:
{ "idx" : { "settings" : { "index.analysis.filter.jus_stemmer.name" : "brazilian", "index.analysis.filter.jus_stop._lang_" : "brazilian" } } }
How can I make the search and ignore accents?
Thanks Friends,
Upvotes: 1
Views: 4039
Reputation: 2241
I stumbled upon this thread since I got the same problem. Here's the NEST code to create an index with an AsciiFolding Analyzer:
// Create the Client
string indexName = "testindex";
var uri = new Uri("http://localhost:9200");
var settings = new ConnectionSettings(uri).SetDefaultIndex(indexName);
var client = new ElasticClient(settings);
// Create new Index Settings
IndexSettings set = new IndexSettings();
// Create a Custom Analyzer ...
var an = new CustomAnalyzer();
// ... based on the standard Tokenizer
an.Tokenizer = "standard";
// ... with Filters from the StandardAnalyzer
an.Filter = new List<string>();
an.Filter.Add("standard");
an.Filter.Add("lowercase");
an.Filter.Add("stop");
// ... just adding the additional AsciiFoldingFilter at the end
an.Filter.Add("asciifolding");
// Add the Analyzer with a name
set.Analysis.Analyzers.Add("nospecialchars", an);
// Create the Index
client.CreateIndex(indexName, set);
Now you can Map your Entity to this index (it's important to do this after you created the Index)
client.MapFromAttributes<TestEntity>();
And here's how such an entity could look like:
[ElasticType(Name = "TestEntity", DisableAllField = true)]
public class TestEntity
{
public TestEntity(int id, string desc)
{
ID = id;
Description = desc;
}
public int ID { get; set; }
[ElasticProperty(Analyzer = "nospecialchars")]
public string Description { get; set; }
}
There you go, the Description-Field is now inserted into the index without accents. You can test this if you check the Mapping of your index:
http://localhost:9200/testindex/_mapping
Which then should look something like:
{
testindex: {
TestEntity: {
_all: {
enabled: false
},
properties: {
description: {
type: "string",
analyzer: "nospecialchars"
},
iD: {
type: "integer"
}
}
}
}
}
Hope this will help someone.
Upvotes: 3
Reputation: 2190
See the solution:
Connect on elasticsearch search with putty execute:
curl -XPOST 'localhost:9200/idx30/_close'
curl -XPUT 'localhost:9200/idx30/_settings' -d '{
"index.analysis.analyzer.default.filter.0": "standard",
"index.analysis.analyzer.default.tokenizer": "standard",
"index.analysis.analyzer.default.filter.1": "lowercase",
"index.analysis.analyzer.default.filter.2": "stop",
"index.analysis.analyzer.default.filter.3": "asciifolding",
"index.number_of_replicas": "1"
}'
curl -XPOST 'localhost:9200/idx30/_open'
Replace "idx30" with name of your index
Done!
Upvotes: 4
Reputation: 33341
You'll want to incorporate an ACSII Folding filter into your analyzer to accomplish this. That will mean constructing the snowballanalyzer form tokenizers and filters (unless nest
allows you to add filters to non-custom analyzers. ElasticSearch doesn't, though, as far as I know).
A SnowballAnalyzer incorporates:
I would probably try to add the ASCIIFoldingFilter
just before LowercaseFilter, although it might be better to add it as the very las step (after SnowballFilter). Try it both ways, see which works better. I don't know enough about either the Protuguese stemmer to say which would be best for sure.
Upvotes: 0