Reputation: 3924
Can Elastic Search split input string into categorized words? i.e. if the input is
4star wi-fi 99$
and we are searching hotels with ES, is it possible to analyze/tokenize this string as
4star - hotel level, wi-fi - hotel amenities, 99$ - price
?
yep, it's a noob question :)
Upvotes: 0
Views: 150
Reputation: 22332
Yes and no.
By default, query_string
searches will work against the automatically created _all
field. The contents of the _all
field come from literally and naively combining all fields into a single analyzed string.
As such, if you have a "4star" rating, a "wi-fi" amenity, and a "99$" price, then all of those values would be inside of the _all
field and you should get relevant hits against it. For example:
{
"level" : "4star",
"amenity" : ["pool", "wi-fi"],
"price" : 99.99
}
The problem is that you will not--without client-side effort--know what field(s) matched when searching against _all
. It won't tell you the breakdown of where each value came from, rather it will simply report a score that determines the overall relevance.
If you have some way of knowing which field each term (or terms) is meant to search against, then you can easily do this yourself (quotes aren't required, but they're good to have to avoid mistakes with spaces). This would be the input that you might provide to the query_string
query linked above:
level:"4star" amenity:"wi-fi" price:(* TO 100)
You could further complicate this by using a spelled out query:
{
"query" : {
"bool" : {
"must" : [
{ "match" : { "level" : "4star" } },
{ "match" : { "amentiy" : "wi-fi" } },
{
"range" : {
"price" : {
"lt" : 100
}
}
}
]
}
}
}
Naturally the last two requests would require advanced knowledge about what each search term referenced. You could certainly use the $
in "99$" as a tipoff for price, but not for the others. Chances are you wouldn't have them typing in 4 stars I hope, rather having some checkboxes or other form-based selections, so this should be quite realistic.
Technically, you could create a custom analyzer that recognized each term based on their position, but that's not really a good or useful idea.
Upvotes: 2