azangru
azangru

Reputation: 2738

Search for multiple incomplete words with Elasticsearch

I have a database of records, each of which has a right and a left field, and both these fields contain text. The database is indexed with Elasticsearch.

I want to search through both fields of these records and find the records that contain in any of the fields two or more of the words with certain prefixes. The search should be specific enough to find only the records that contain all words in the query, not just some of them.

For example, a query qui bro should return the record containing the sentence The quick brown fox jumped over the lazy dog, but not the one containing the sentence The quick fox jumped over the lazy dog

I've seen a description of how to perform prefix queries with Elasticsearch (and can reproduce it when searching for one word in one field).

I've also seen a description of how to perform multi-match queries to search through several fields at once.

But what I need is some combination of these techniques, which would allow me both to search through several fields at once, and to look only for parts of words. And to get only those records that have all the words whose parts are contained in the query.

How can I do that? Any method will do (prefixes, ngrams, whatever).

(P.S.: My question may, to a certain extent, be a duplicate of this one, but since it never was answered, I hope I'm not breaking any rules by asking mine.)

======================================

UPDATED:

Oh, I might have the first part of the question. Here is the syntax that seems to work in my Rails app (using elasticsearch-rails gem):

response = Paragraph.search query: {bool: { must: [ { prefix: {right: "qui"}}, {prefix: {right: "bro"}} ] } }

Or, to re-write it in pure Elasticsearch syntax:

{
  "bool": {
    "must": [
      { "prefix": { "right": "qui" }},
      { "prefix": { "right": "bro"   }}
    ]
  }
}

So my updated question now is how to combine this prefix search with multi_match search (to search both through the right and the left field.

Upvotes: 2

Views: 995

Answers (1)

azangru
azangru

Reputation: 2738

OK, here is a possible answer that seems to work. The code has to search through multiple fields for several incomplete words and return only the records that contain all these words.

Here is the request written in elasticsearch-rails syntax:

response = Paragraph.search query: {bool: { must: [ { multi_match: { query: "qui", type: "phrase_prefix", fields: ["right", "left"]}}, { multi_match: { query: "brow", type: "phrase_prefix", fields: ["right", "left"]}}]}}

Or, re-written in the syntax that is used on Elasticsearch site:

{query:
  {bool:
    { must:
     [ 
       { multi_match:
         {
          query: "qui",
          type: "phrase_prefix",
          fields: ["right", "left"]
          }
        }, 
       { multi_match: 
         { 
          query: "brow",
          type: "phrase_prefix",
          fields: ["right", "left"]
          }
        }
      ]
    }
  }
}

This seems to work. But if somebody has other solutions (particularly if these solutions will make the search case-insensitive), I will be happy to hear them.

Upvotes: 2

Related Questions