J.Sch
J.Sch

Reputation: 23

Elasticsearch: find documents containing not more terms than in the query

If I have documents:

1: { "name": "red yellow" }
2: { "name": "green yellow" }

I'd like to query with "red brown yellow" and get document 1.

I mean the query should contain at least terms form my document, but can contain more. If document contains a token whats not in the query, there should be not hit.

How can I do this? The other way around is easy ...

Upvotes: 2

Views: 195

Answers (2)

Bhavya
Bhavya

Reputation: 16192

The match query is of type boolean. It means that the text provided is analyzed and the analysis process constructs a boolean query from the provided text. The minimum number of optional should clauses to match can be set using the minimum_should_match parameter.

To know more about match query, you can refer ES documentation

Below is the mapping of name field

{
"tests": {
    "mappings": {
        "properties": {
            "name": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256
                    }
                }
            }
        }
    }
}

}

Now when you search "red brown yellow" from the below query

POST tests/_search

{
"query": {
    "match": {
        "name": {
            "query": "red brown yellow",
            "minimum_should_match": "75%"
        }
    }
 }

}

You get your required result:

    {
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 0.87546873,
    "hits": [
      {
        "_index": "tests",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.87546873,
        "_source": {
          "name": "red yellow"
        }
      }
    ]
  }
}

The output will not include green yellow . This is because the second document, only matches 1/3 of the query terms, which is below 75%

Upvotes: 0

Luc Ebert
Luc Ebert

Reputation: 1255

First you have to declare your field as fielddata : true in order to execute script on it :

PUT test
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "fielddata": true
      }
    }
  }
}

Then, you can filter your result with a script on your query:

POST test/_search
{
  "query": {
    "bool": {
      "filter": {
        "script": {
          "script": {
            "source": """
                boolean res = true;
                for (item in doc['name']) {
                   res = 'red brown yellow'.contains(item) && res;
                 }
                 return res;
              """,
            "lang": "painless"
          }
        }
      },
      "must": [
        {
          "match": {
            "name": "red brown yellow"
          }
        }
      ]
    }
  }
}

Note that fielddata on a text field can cost a lot and it's better if fou can index this field as Keyword on an array as follows :

1: { "name": ["red","yellow"] }
2: { "name": ["green", "yellow"] }

The search request can be exactly the same

Upvotes: 1

Related Questions