Reputation: 155
I'm trying to implement an auto-suggest control powered by an ES index. The index has multiple fields and I want to be able to query across multiple fields using the AND operator and allowing for partial matches (prefix only).
Just as an example, let's say I got 2 fields I want to query on: "colour" and "animal". I would like to be able to fulfil queries like "duc", "duck", "purpl", "purple", "purple duck". I managed to get all these working using multi_match() with AND operator.
What I don't seem to be able to do is match on queries like "purple duc", as multi_match doesn't allow for wildcards.
I've looked into match_phrase_prefix() but as i understand it, it doesn't span across multiple fields.
I'm turning toward the implementation of a tokeniser: it feels the solution may be there, so ultimately the questions are:
1) can someone confirm there's no out-of-the-box function to do what I want to do? It feels like a common enough pattern that there could be something ready to use.
2) can someone suggest any solution? Are tokenizers part of the solution? I'm more than happy to be pointed in the right direction and do more research myself. Obviously if someone has working solutions to share that would be awesome.
Thanks in advance - F
Upvotes: 8
Views: 5798
Reputation: 8718
I actually wrote a blog post about this awhile back for Qbox, which you can find here: http://blog.qbox.io/multi-field-partial-word-autocomplete-in-elasticsearch-using-ngrams. (Unfortunately some of the links on the post are broken, and can't easily be fixed at this point, but hopefully you'll get the idea.)
I'll refer you to the post for the details, but here is some code you can use to test it out quickly. Note that I'm using edge ngrams instead of full ngrams.
Also note in particular the use of the _all field, and the match query operator.
Okay, so here is the mapping:
PUT /test_index
{
"settings": {
"analysis": {
"filter": {
"edgeNGram_filter": {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"edgeNGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"edgeNGram_filter"
]
}
}
}
},
"mappings": {
"doc": {
"_all": {
"enabled": true,
"index_analyzer": "edgeNGram_analyzer",
"search_analyzer": "standard"
},
"properties": {
"field1": {
"type": "string",
"include_in_all": true
},
"field2": {
"type": "string",
"include_in_all": true
}
}
}
}
}
Now add a few documents:
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"field1":"purple duck","field2":"brown fox"}
{"index":{"_id":2}}
{"field1":"slow purple duck","field2":"quick brown fox"}
{"index":{"_id":3}}
{"field1":"red turtle","field2":"quick rabbit"}
And this query seems to illustrate what you're wanting:
POST /test_index/_search
{
"query": {
"match": {
"_all": {
"query": "purp fo slo",
"operator": "and"
}
}
}
}
returning:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.19930676,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 0.19930676,
"_source": {
"field1": "slow purple duck",
"field2": "quick brown fox"
}
}
]
}
}
Here is the code I used to test it out:
http://sense.qbox.io/gist/b87e426062f453d946d643c7fa3d5480cd8e26ec
Upvotes: 10