Typewar
Typewar

Reputation: 983

multiple words act as single word in search - Elasticsearch

I have an issue with tags such as social media, two words, tag with many spaces have a multiplied score for each word in search query.

How can I achieve to search two words as one word instead getting different score when searching two and two words

Here is a visual representation the current results score:

+-----------------------+-------+
| search                | score |
+-----------------------+-------+
| two                   | 2.76  |
| two words             | 5.53  |
| tag with many spaces  | 11.05 |
| singleword            | 2.76  |

Here is a visual representation of what I want:

+-----------------------+-------+
| search                | score |
+-----------------------+-------+
| two                   | 2.76  |
| two words             | 2.76  |
| tag with many spaces  | 2.76  |
| singleword            | 2.76  |

There are multiple tags in each document. each tag search is broken down by a comma , in PHP and outputted like the query below

Assuming a document has multiple tags including two words and singleword, this would be the search query:

"query": {
    "function_score": {
        "query": {
            "bool": {
                "should": [
                    {
                        "match": {
                            "tags.name": "two words"
                        }
                    },
                    {
                        "match": {
                            "tags.name": "singleword"
                        }
                    }
                ]
            }
        },
        "functions": [
            {
                "field_value_factor": {
                    "field": "tags.votes"
                }
            }
        ],
        "boost_mode": "multiply"
    }
}

The score will be different if searching two instead of two words

Here is how the result looks like when searching two words

{
    "_index": "index",
    "_type": "type",
    "_id": "u10q42cCZsbFNf1W0Tdq",
    "_score": 4.708793,
    "_source": {
        "url": "example.com",
        "title": "title of the document",
        "description": "some description of the document",
        "popularity": 9,
        "tags": [
            {
                "name": "two words",
                "votes": 1
            },
            {
                "name": "singleword",
                "votes": 1
            },
            {
                "name": "othertag",
                "votes": 1
            },
            {
                "name": "random",
                "votes": 1
            }
        ]
    }
}

Here is the result when searching two instead of two words

{
    "_index": "index",
    "_type": "type",
    "_id": "u10q42cCZsbFNf1W0Tdq",
    "_score": 3.4481666,
    "_source": {
        "url": "example.com",
        "title": "title of the document",
        "description": "some description of the document",
        "popularity": 9,
        "tags": [
            {
                "name": "two words",
                "votes": 1
            },
            {
                "name": "singleword",
                "votes": 1
            },
            {
                "name": "othertag",
                "votes": 1
            },
            {
                "name": "random",
                "votes": 1
            }
        ]
    }
}

Here is the mapping (for the tags specifically)

"tags": {
  "type": "nested",
  "include_in_parent": true,
  "properties": {
    "name": {
      "type": "text",
      "fields": {
        "keyword": {
          "type": "keyword",
          "ignore_above": 256
        }
      }
    },
    "votes": {
      "type": "long"
    }
  }
}

I have tried searching with "\"two words\"" and "*two words*" but it gave no difference.

Is it possible to achieve this?

Upvotes: 1

Views: 3457

Answers (1)

Pierre Mallet
Pierre Mallet

Reputation: 7221

You should use the non analyzed string for your matching and switch to a term query.

Can you try :

"query": {
    "function_score": {
        "query": {
            "bool": {
                "should": [
                    {
                        "term": {
                            "tags.name.keyword": "two words"
                        }
                    },
                    {
                        "term": {
                            "tags.name.keyword": "singleword"
                        }
                    }
                ]
            }
        },
        "functions": [
            {
                "field_value_factor": {
                    "field": "tags.votes"
                }
            }
        ],
        "boost_mode": "multiply"
    }
}

With your actual implementation, when you do a match query with the query "two words" it will analyze your query to search for token "two" and "words" in your tags. So documents with tag "two words" will match the two tokens and will be boosted.

Upvotes: 2

Related Questions