RPM1984
RPM1984

Reputation: 73123

How to combine completion, suggestion and match phrase across multiple text fields?

I've been reading about Elasticsearch suggesters, match phrase prefix and highlighting and i'm a bit confused as to which to use to suit my problem.

Requirement: i have a bunch of different text fields, and need to be able to autocomplete and autosuggest across all of them, as well as misspelling. Basically the way Google works.

See in the following Google snapshot, when we start typing "Can", it lists word like Canadian, Canada, etc. This is auto complete. However it lists additional words also like tire, post, post tracking, coronavirus etc. This is auto suggest. It searches for most relevant word in all fields. If we type "canxad" it should also misspel suggest the same results.

enter image description here

Could someone please give me some hints on how i can implement the above functionality across a bunch of text fields?

At first i tried this:

GET /myindex/_search
{
  "query": {
    "match_phrase_prefix": {
      "myFieldThatIsCombinedViaCopyTo": "revis"
    }
  },
  "highlight": {
    "fields": {
      "*": {}
    },
    "require_field_match" : false
  }
}

but it returns highlights like this:

"In the aforesaid revision filed by the members of the Committee, the present revisionist was also party",

So that's not a "prefix" anymore...

Also tried this:

GET /myindex/_search
{
  "query": {
    "multi_match": {
      "query": "revis",
      "fields": ["myFieldThatIsCombinedViaCopyTo"],
      "type": "phrase_prefix",
      "operator": "and"
    }
  },
  "highlight": {
    "fields": {
      "*": {}
    }
  }
}

But it still returns

"In the aforesaid revision filed by the members of the Committee, the present revisionist was also party",

Note: I have about 5 "text" fields that I need to search upon. One of those fields is quite long (1000s of words). If I break things up into keywords, I lose the phrase. So it's like I need match phrase prefix across a combined text field, with fuzziness?

EDIT Here's an example of a document (some fields taken out, content snipped):

{
  "id" : 1,
  "respondent" : "Union of India",
  "caseContent" : "<snip>..against the Union of India, through the ...<snip>"
}

As @Vlad suggested, i tried this:

POST /cases/_search
POST /cases/_search
{
  "suggest": {
    "respondent-suggest": {
      "prefix": "uni",
      "completion": {
        "field": "respondent.suggest",
        "skip_duplicates": true
      }
    },
    "caseContent-suggest": {
      "prefix": "uni",
      "completion": {
        "field": "caseContent.suggest",
        "skip_duplicates": true
      }
    }
  }
}

Which returns this:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "suggest" : {
    "caseContent-suggest" : [
      {
        "text" : "uni",
        "offset" : 0,
        "length" : 3,
        "options" : [ ]
      }
    ],
    "respondent-suggest" : [
      {
        "text" : "uni",
        "offset" : 0,
        "length" : 3,
        "options" : [
          {
            "text" : "Union of India",
            "_index" : "cases",
            "_type" : "_doc",
            "_id" : "dI5hh3IBEqNFLVH6-aB9",
            "_score" : 1.0,
            "_ignored" : [
              "headNote.suggest"
            ],
            "_source" : {
              <snip>
            }
          }
        ]
      }
    ]
  }
}

So looks like it matches on the respondent field, which is great! But, it didn't match on the caseContent field, even though the text (see above) includes the phrase "against the Union of India".. shouldn't it match there? or is it because how the text is broken up?

Upvotes: 7

Views: 2844

Answers (1)

Val
Val

Reputation: 217464

Since you need autocomplete/suggest on each field, then you need to run a suggest query on each field and not on the copy_to field. That way you're guaranteed to have the proper prefixes.

copy_to fields are great for searching in multiple fields, but not so good for auto-suggest/-complete type of queries.

The idea is that for each of your fields, you should have a completion sub-field so that you can get auto-complete results for each of them.

PUT index
{
  "mappings": {
    "properties": {
      "text1": {
        "type": "text",
        "fields": {
          "suggest": {
            "type": "completion"
          }
        }
      },
      "text2": {
        "type": "text",
        "fields": {
          "suggest": {
            "type": "completion"
          }
        }
      },
      "text3": {
        "type": "text",
        "fields": {
          "suggest": {
            "type": "completion"
          }
        }
      }
    }
  }
}

Your suggest queries would then run on all the sub-fields directly:

POST index/_search?pretty
{
    "suggest": {
        "text1-suggest" : {
            "prefix" : "revis", 
            "completion" : { 
                "field" : "text1.suggest" 
            }
        },
        "text2-suggest" : {
            "prefix" : "revis", 
            "completion" : { 
                "field" : "text2.suggest" 
            }
        },
        "text3-suggest" : {
            "prefix" : "revis", 
            "completion" : { 
                "field" : "text3.suggest" 
            }
        }
    }
}

That takes care of the auto-complete/-suggest part. For misspellings, the suggest queries allow you to specify a fuzzy parameter as well

UPDATE

If you need to do prefix search on all sentences within a body of text, the approach needs to change a bit.

The new mapping below creates a new completion field next to the text one. The idea is to apply a small transformation (i.e. split sentences) to what you're going to store in the completion field. So first create the index mapping like this:

PUT index
{
  "mappings": {
    "properties": {
      "text1": {
        "type": "text",
      },
      "text1Suggest": {
        "type": "completion"
      }
    }
  }
}

Then create an ingest pipeline that will populate the text1Suggest field with sentences from the text1 field:

PUT _ingest/pipeline/sentence
{
  "processors": [
    {
      "split": {
        "field": "text1",
        "target_field": "text1Suggest.input",
        "separator": "\\.\\s+"
      }
    }
  ]
}

Then we can index a document such as this one (with only the text1 field as the completion field will be built dynamically)

PUT test/_doc/1?pipeline=sentence
{
  "text1": "The crazy fox. The quick snail. John goes to the beach"
}

What gets indexed looks like this (your text1 field + another completion field optimized for sentence prefix completion):

{
  "text1": "The crazy fox. The cat drinks milk. John goes to the beach",
  "text1Suggest": {
    "input": [
      "The crazy fox",
      "The cat drinks milk",
      "John goes to the beach"
    ]
  }
}

And finally you can search for prefixes of any sentence, below we search for John and you should get a suggestion:

POST test/_search?pretty
{
  "suggest": {
    "text1-suggest": {
      "prefix": "John",
      "completion": {
        "field": "text1Suggest"
      }
    }
  }
}

Upvotes: 5

Related Questions