Reputation: 889
For my search I want to take into account the fact that the "space"
character is not mandatory in a filter request.
For exemple:
when I filter on "THE ONE"
I see the corresponding document.
I want to see it even if I write "THEONE"
.
This is how my query is built today:
boolQueryBuilder.must(QueryBuilders.boolQuery()
.should(QueryBuilders.wildcardQuery("description", "*" +
searchedWord.toLowerCase() + "*"))
.should(QueryBuilders.wildcardQuery("id", "*" +
searchedWord.toUpperCase() + "*"))
.should(QueryBuilders.wildcardQuery("label", "*" +
searchedWord.toUpperCase() + "*"))
.minimumShouldMatch("1"));
What I want is to add this filter: (Writing a space-ignoring autocompleter with ElasticSearch)
"word_joiner": {
"type": "word_delimiter",
"catenate_all": true
}
But I don't know how to do this using the API.
Any idea?
Thanks!
EDIT: Following @raam86 suggestion, I added my own custom analyzer:
{
"index": {
"number_of_shards": 1,
"analysis": {
"filter": {
"word_joiner": {
"type": "word_delimiter",
"catenate_all": true
}
},
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"word_joiner"
]
}
}
}
}
}
And here is the document:
@Document(indexName = "cake", type = "pa")
@Setting(settingPath = "/elasticsearch/config/settings.json")
public class PaElasticEntity implements Serializable {
@Field(type = FieldType.String, analyzer = "custom_analyzer")
private String maker;
}
Still not working...
Upvotes: 1
Views: 9470
Reputation: 309
You need a shingle token filter. Simple example.
1. create index with settings
PUT joinword
{
"settings": {
"analysis": {
"filter": {
"word_joiner": {
"type": "shingle",
"output_unigrams": "true",
"token_separator": ""
}
},
"analyzer": {
"word_join_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"word_joiner"
]
}
}
}
}
}
2. check that analyzer work as expected
GET joinword/_analyze?pretty
{
"analyzer": "word_join_analyzer",
"text": "ONE TWO"
}
output:
{
"tokens" : [ {
"token" : "one",
"start_offset" : 0,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 0
}, {
"token" : "onetwo",
"start_offset" : 0,
"end_offset" : 7,
"type" : "shingle",
"position" : 0
}, {
"token" : "two",
"start_offset" : 4,
"end_offset" : 7,
"type" : "<ALPHANUM>",
"position" : 1
} ]
}
So now you can find this document by one
, two
or onetwo
. A search will be case insensitive.
Full project available on GitHub.
Entity:
@Document(indexName = "document", type = "document", createIndex = false)
@Setting(settingPath = "elasticsearch/document_index_settings.json")
public class DocumentES {
@Id()
private String id;
@Field(type = String, analyzer = "word_join_analyzer")
private String title;
public DocumentES() {
}
public DocumentES(java.lang.String title) {
this.title = title;
}
public java.lang.String getId() {
return id;
}
public void setId(java.lang.String id) {
this.id = id;
}
public String getTitle() {
return title;
}
public void setTitle(String title) {
this.title = title;
}
@Override
public java.lang.String toString() {
return "DocumentES{" +
"id='" + id + '\'' +
", title='" + title + '\'' +
'}';
}
}
Main:
@SpringBootApplication
@EnableConfigurationProperties(value = {ElasticsearchProperties.class})
public class Application implements CommandLineRunner {
@Autowired
ElasticsearchTemplate elasticsearchTemplate;
public static void main(String[] args) {
SpringApplication.run(Application.class);
}
@Override
public void run(String... args) throws Exception {
elasticsearchTemplate.createIndex(DocumentES.class);
elasticsearchTemplate.putMapping(DocumentES.class);
elasticsearchTemplate.index(new IndexQueryBuilder()
.withIndexName("document")
.withType("document")
.withObject(new DocumentES("ONE TWO")).build()
);
Thread.sleep(2000);
NativeSearchQuery query = new NativeSearchQueryBuilder()
.withIndices("document")
.withTypes("document")
.withQuery(matchQuery("title", "ONEtWO"))
.build();
List<DocumentES> result = elasticsearchTemplate.queryForList(query, DocumentES.class);
result.forEach (System.out::println);
}
}
Upvotes: 5