CoffeJunky
CoffeJunky

Reputation: 1087

Elastic Search: boost "nicer" terms

Imaging the following, search term "brown fox".

All documents do have another field, called it alternate name.

This field sometimes conatins nice terms like "Animal" or "Fox". Sometimes it contains "not nice" (or human readable) terms like ED2314 or 1231-234-D.

What the "not nice" things have in common they have a "high" amount of numbers or they a really not human language WIPSDIFOW.

Any ideas on "boost" or sort the nicer terms to the top?

Update 2016-01-24 Thank you for the question.

The search will be done on the field "name" for example. The "alternate name" field won't be queried with the user input. It is just a field that is relevant for sorting. The user likes to see it in the result list, but with the explained sorting / boosting. Thx

Upvotes: 1

Views: 313

Answers (2)

Peter Dixon-Moses
Peter Dixon-Moses

Reputation: 3209

You could do some limited script-based scoring and sorting (with a performance penalty).

But if this is the first of (likely) many requests to "tweak scoring" based on unstructured data, you'd be better-served annotating your data (pre-indexing) to codify scoring logic more clearly in the index.


I.e. Add fields like alternate_name.dictionary_words, alternate_name.non_dictionary_words (and maybe alternate_name.dictionary_word_composition_percent) and use a dictionary to enrich the data set just before you load it.

The advantage here is that the scoring strategy appears in the data, query performance (including the "percent" field in your scoring or sorting criteria) is better, and you'll have the ability to use the human-readable terms in isolation for future features (facets/autocomplete/spellcheck). Plus the non-human-readable terms will be more accessible for future analysis (when, eg, you had enough information to annotate/separate-out say "part_numbers")

Upvotes: 1

Igor Belo
Igor Belo

Reputation: 738

Using the bool query you can boost nice terms (query clauses) encapsulating them into the should key:

{
    "query": {
        "bool": {
            "must": {
                "match": {  
                    "field": {
                        "query": "User input"
                    }
                }
            },
            "should": [
                { "match": {
                    "field": {
                        "query": "Animal"
                    }
                }},
                { "match": {
                    "field": {
                        "query": "Fox"
                    }
                }}
            ]
        }
    }
}

To control the relevance of the nice terms, you may use the boost option as well:

...
{
  "match": {
    "field": {
      "query": "Fox",
      "boost": 3
    }
  }
}
...

See Reference.

Upvotes: 0

Related Questions