Reputation: 10376
I'm using default analyzers and indexing. So let's say I have this simple mapping:
"question": {
"properties": {
"title": {
"type": "string"
},
"answer": {
"properties": {
"text": {
"type": "string"
}
}
}
}
}
(that was an example. sorry if it has typos)
Now, I perform the following search.
GET _search
{
"query": {
"query_string": {
"query": "yes correct",
"fields": ["answer.text"]
}
}
}
The results will score a text
value like "yes correct." (doc id value 1
) higher than simply "yes correct" (without a period, doc id value 181
). Both hits have the same score value, but the hits array lists the one with the smaller doc id first. I understand that the default index option includes sorting by doc id, so how do I exclude that one attribute and still use the rest of the default options?
I'm not setting any custom analyzers, so everything is using default values for Elasticsearch 2.0.
Upvotes: 1
Views: 1278
Reputation: 14097
This is probably a use case for Dis Max Query
A query that generates the union of documents produced by its subqueries, and that scores each document with the maximum score for that document as produced by any subquery, plus a tie breaking increment for any additional matching subqueries.
So following that, you need to make your answer score as an exact match and give it highest boost. You'll have to use a custom analyzer for that. That'd be your mappings:
PUT /test
{
"settings": {
"analysis": {
"analyzer": {
"my_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"asciifolding",
"lowercase"
]
}
}
}
},
"mappings": {
"question": {
"properties": {
"title": {
"type": "string"
},
"answer": {
"type": "object",
"properties": {
"text": {
"type": "string",
"analyzer": "my_keyword",
"fields": {
"stemmed": {
"type": "string",
"analyzer": "standard"
}
}
}
}
}
}
}
}
}
Your test data:
PUT /test/question/1
{
"title": "title nr1",
"answer": [
{
"text": "yes correct."
}
]
}
PUT /test/question/2
{
"title": "title nr2",
"answer": [
{
"text": "yes correct"
}
]
}
Now when you're querying for "yes correct."
using such query:
POST /test/_search
{
"query": {
"dis_max": {
"tie_breaker": 0.7,
"boost": 1.2,
"queries": [
{
"match": {
"answer.text": {
"query": "yes correct.",
"type": "phrase"
}
}
},
{
"match": {
"answer.text.stemmed": {
"query": "yes correct.",
"operator": "and"
}
}
}
]
}
}
}
You get this output:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.37919715,
"hits": [
{
"_index": "test",
"_type": "question",
"_id": "1",
"_score": 0.37919715,
"_source": {
"title": "title nr1",
"answer": [
{
"text": "yes correct."
}
]
}
},
{
"_index": "test",
"_type": "question",
"_id": "2",
"_score": 0.11261705,
"_source": {
"title": "title nr2",
"answer": [
{
"text": "yes correct"
}
]
}
}
]
}
}
If you run very same query without trailing dot, which then becomes "yes correct"
, you're getting this result:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.37919715,
"hits": [
{
"_index": "test",
"_type": "question",
"_id": "2",
"_score": 0.37919715,
"_source": {
"title": "title nr2",
"answer": [
{
"text": "yes correct"
}
]
}
},
{
"_index": "test",
"_type": "question",
"_id": "1",
"_score": 0.11261705,
"_source": {
"title": "title nr1",
"answer": [
{
"text": "yes correct."
}
]
}
}
]
}
}
Hopefully this is what you're looking for.
By the way, I'd recommend to always use Match query when performing text search. Taken from documentation:
Comparison to query_string / field
The match family of queries does not go through a "query parsing" process. It does not support field name prefixes, wildcard characters, or other "advanced" features. For this reason, chances of it failing are very small / non existent, and it provides an excellent behavior when it comes to just analyze and run that text as a query behavior (which is usually what a text search box does). Also, the phrase_prefix type can provide a great "as you type" behavior to automatically load search results.
Upvotes: 1
Reputation: 19283
Elasticsearch or rather Lucene scoring does not take into account the relative positioning of the tokens. It utlizes 3 different criterias to do the same
You can learn more about it here.
Upvotes: 0