Reputation: 508
So let's say I have an ElasticSearch index defined like this:
curl -XPUT 'http://localhost:9200/test' -d '{
"mappings": {
"example": {
"properties": {
"text": {
"type": "string",
"analyzer": "snowball"
}
}
}
}
}'
curl -XPUT 'http://localhost:9200/test/example/1' -d '{
"text": "foo bar organization"
}'
When I search for "foo organizations" with snowball analyzer, both keywords match as expected:
curl -XGET http://localhost:9200/test/example/_search -d '{
"query": {
"text": {
"_all": {
"query": "foo organizations",
"analyzer": "snowball"
}
}
},
"highlight": {
"fields": {
"text": {}
}
}
}'
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.015912745,
"hits": [
{
"_index": "test",
"_type": "example",
"_id": "1",
"_score": 0.015912745,
"_source": {
"text": "foo bar organization"
},
"highlight": {
"text": [
"<em>foo</em> bar <em>organization</em>"
]
}
}
]
}
}
But when I search for only "organizations" I don't get any result at all which is very weird:
curl -XGET http://localhost:9200/test/example/_search -d '{
"query": {
"text": {
"_all": {
"query": "organizations",
"analyzer": "snowball"
}
}
},
"highlight": {
"fields": {
"text": {}
}
}
}'
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
However, if I search for "bars" it still hits:
curl -XGET http://localhost:9200/test/example/_search -d '{
"query": {
"text": {
"_all": {
"query": "bars",
"analyzer": "snowball"
}
}
},
"highlight": {
"fields": {
"text": {}
}
}
}'
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.10848885,
"hits": [
{
"_index": "test",
"_type": "example",
"_id": "1",
"_score": 0.10848885,
"_source": {
"text": "foo bar organization"
},
"highlight": {
"text": [
"foo <em>bar</em> organization"
]
}
}
]
}
}
I guess the difference between "bar" and "organization" is that "organization" is stemmed to "organ" while "bar" is stemmed to itself. But how do I get the proper behaviour so that 2nd search hits?
Upvotes: 1
Views: 2072
Reputation: 279
It's better to use analyzer at index time than search time..Map your text field to snow ball analyzer and then index. This will create some tokens for organization which includes organizations.It works for me
Upvotes: 0
Reputation: 30163
Text "foo bar organization" is getting indexed twice - in the field text and in the field _all. The field text is using snowball analyzer, and the field _all is using standard analyzer. Therefore after analysis of the test record the field _all contains tokens: "foo", "bar", and "organization". During search specified snowball analyzer converts "foo" into "foo", "bars" into "bar" and "organization" into "organ". So, words "foo" and "bars" in the query match the test record and the term "organization" doesn't. Highlighting is performed on per field basis independently from searching and that's why word "organization" is getting highlighted in the first result.
Upvotes: 1