Reputation: 6955
I'm implementing a sort of "natural language" search assistant. I have a form with a number of select fields. The list of options in each field can be pretty lengthy. So rather than having to select each item individually, I'm adding a text input box where people can just type what they're looking for and the app will suggest possible searches, based on the options in the select dropdowns.
Let's say my options are:
If you typed in "2007 very small star-spangled", the text input would suggest "Search all 2007 very small widgets for 'star-spangled'". It understood that "2007" and "very small" were select options in the form, and that "star-spangled" was not, and suggested a search where "2007" and "very small" are selected, and then left the "star-spangled" bit for a plaintext search.
What I'm working on right now is parsing the search query and picking out the bits that fit into the select fields. I have all the options in Elasticsearch. I was thinking of searching each type individually to see if it matches anything in the search query. That seems straightforward to me. I can easily find matches. However, I don't know which part of the query actually matches each type, which I need in order to find out that e.g. "star-spangled" is the part that didn't match options.
So, in the end, I need to know that only the "2007" substring matched the year, only the "very small" substring matched the size, and "star-spangled" didn't match anything.
My first thought is to split the query into word-grams (e.g. "2007", "2007 very", "2007 very small", "2007 very small star-spangled", "very", "very small", "very small star-spangled", "small", "small star-spangled", "star-spangled") and search each option for each gram. Then I would know for sure which gram matched. However, this could obviously get resource intensive pretty quickly. Also, I know Elasticsearch can do that sort of search internally much faster.
So what I really need is to be able to perform a search and, along with the results, get back which part of the original query actually matched. So if I searched, "2007 verr small" (intentional misspelling) and did a fuzzy search of sizes, passing the entire query string, and I get the "Very Small" size back as a result, it would indicate that "verr small" is the part of the query that matched that size.
Any idea of how to do that? Or possibly some other solutions?
I could do the search and parse the results to see which bits match the string. Though I could see that being resource intensive as well. And if I'm doing a fuzzy search, it wouldn't necessarily be clear which part of the query triggered a match in the result.
I was also thinking that highlighting might work for this, but I don't know enough about Elasticsearch to know for sure.
EDIT: I tested this out using highlighting. It's so close to working. The highlight field comes back with the part of the string that matches. However, it only shows the part of the result that matches. It doesn't show the part of the query that matches. So if I want to allow for fuzzy searches, the highlight field won't match the original query and I won't be able to tell which part of the query matched. For example, a query of "very smaal" will return the size "Very Small", but the highlight field will show <em>very</em> <em>small</em>
, not <em>very</em> <em>smaal</em>
.
Upvotes: 3
Views: 1324
Reputation: 15559
There are 2 types of queries in Elasticsearch, Match Query and Filtered Query. Match query matches your term in the documents and find all the relevant documents with a relevance score. For example when you search for term: "help fixing javascript problem" you are interested in all documents which contain one or more of the search term.
On the other hand, when you are using Filtered Query, a document is either a match or not match... there is no relevance score here... as an example, you want all the products built in year "2007"... here you need to use a filtered query. All the product built in 2007 have the same score and all other years are excluded from the result.
In my opinion, your problem should be dealt with Filter Query...
When using filter query, normally each filter has its own corresponding input in the UI, consider the following screen-shot which is from ebay:
If I have understood your requirement correctly, you want to include all those filters in a single search-box. In my opinion, this is nearly impossible to implement because you have no way to parse user input and decide which word corresponds to which filter...
If you want to go down the filter path, it's better to introduce corresponding UI fields for each filter...
If you want to stick to a single search box, then don't implement the filter functionality and stick to Elasticsearch Multi-match query... you can match the input term across multiple fields but you won't be able to filter out (exclude) result instead you get a relevance score.
Upvotes: 1