Reputation: 555
I use whitespace analyzer to indexing my field named hash, so my field text '1 2 3 4 5' will be index to five terms [1, 2, 3, 4, 5] .
My question is how to match with exact term potion ? for example, accuracy is greater than 4/5 , '2 1 3 4 5' will not match, '8 2 3 4 5' will match. How to do that?
Spliting into five field is ok , but I want just one field .
Upvotes: 2
Views: 697
Reputation: 555
Use whitespace analyzer, make position as a part of text value, change '1 2 3 4 5' to '0_1 1_2 2_3 3_4 4_5' before index, 0_1 means position is 0 and value is 1. It's one field indexed, but still need multi-terms query when search .
query '8 2 3 4 5' :
should: [
{ term: { hash: '0_8' } },
{ term: { hash: '1_2' } },
{ term: { hash: '2_3' } },
{ term: { hash: '3_4' } },
{ term: { hash: '4_5' } },
],
minimum_should_match: 4
Upvotes: 0
Reputation: 7221
You can use a combination of shingle token filter and minimum should match at query time :
Explanation :
With a shingle token filter "1 2 3 4 5" can be transformed into this token stream :
{
"tokens": [
{
"token": "1 2",
"start_offset": 0,
"end_offset": 3,
"type": "shingle",
"position": 0
},
{
"token": "2 3",
"start_offset": 2,
"end_offset": 5,
"type": "shingle",
"position": 1
},
{
"token": "3 4",
"start_offset": 4,
"end_offset": 7,
"type": "shingle",
"position": 2
},
{
"token": "4 5",
"start_offset": 6,
"end_offset": 9,
"type": "shingle",
"position": 3
}
]
}
The same applies to your query. So shingle token will only match if numbers are in the correct order. The usage of minimu_should_match will control the pourcentage of token of the query that need to match in the document.
So here is the example :
In the mapping we configure the shingle filter and an analyzer using it
PUT so_54684997
{
"mappings": {
"_doc": {
"properties": {
"content": {
"type": "text",
"analyzer": "myShingledAnalyzer"
}
}
}
},
"settings": {
"analysis": {
"filter": {
"myShingle": {
"type": "shingle",
"output_unigrams": false
}
},
"analyzer": {
"myShingledAnalyzer": {
"tokenizer": "whitespace",
"filter": ["myShingle"]
}
}
}
}
}
We add the document
PUT so_54684997/_doc/1
{
"content": "1 2 3 4 5"
}
Query 1 => Don't match (all number but no 4/5 in the same order)
POST so_54684997/_search
{
"query": {
"match": {
"content": {
"query": "2 1 3 4 5",
"minimum_should_match": "80%"
}
}
}
}
Query 2 => Match (4 of 5 number but in the good order)
POST so_54684997/_search
{
"query": {
"match": {
"content": {
"query": "1 2 3 4",
"minimum_should_match": "80%"
}
}
}
}
Query 3 => Match (4 of 5 number in the same order)
POST so_54684997/_search
{
"query": {
"match": {
"content": {
"query": "8 2 3 4 5",
"minimum_should_match": "80%"
}
}
}
}
I dont know if this will handle all your cases but i think its a good hint to start !
Upvotes: 2