Fizmeister
Fizmeister

Reputation: 109

How to convert Lucene query string to Elasticsearch Match/Match_Prefix etc equivalent

I am currently working on migrating from SOLR v3 to Elasticsearch v5.11. My question is, how would I convert the below query string to an Elasticsearch Match/Match Phrase etc equivalent. Is this even possible?

(entityName:(john AND lewis OR "john lewis") 
OR entityNameText:(john AND lewis OR "john lewis")) 
AND (status( "A" OR "I" status))

I tried to do so, so far only with the first set of brackets but it doesn't seem correct:

{
"bool": {
    "should": [
        [{
            "bool": {
                "should": [
                    [{
                        "match_phrase": {
                            "entityName": "john lewis"
                        }
                    }]
                ],
                "must": [
                    [{
                        "match": {
                            "entityName": {
                                "query": "john lewis",
                                "operator": "and"
                            }
                        }
                    }]
                ]
            }
        }, {
            "bool": {
                "should": [
                    [{
                        "match_phrase": {
                            "entityNameText": "john lewis"
                        }
                    }]
                ],
                "must": [
                    [{
                        "match": {
                            "entityNameText": {
                                "query": "john lewis",
                                "operator": "and"
                            }
                        }
                    }]
                ]
            }
        }]
    ]
}

}

Thanks

Updated:

entityName and entityNameText are both mapped as text types with custom analyzers for both search and query. Status is mapped as a keyword type.

Upvotes: 2

Views: 5680

Answers (2)

Fizmeister
Fizmeister

Reputation: 109

Posting the answer for anyone that is interesting in this in the future. Not entirely sure why but I wrote two alternative queries using ES Query DSL and found them to be equivalent to the original Lucene query, returning exactly the same results. Not sure if that's a pro or con of the ES Query DSL.

Original Lucene Query:

{
"query": {
    "query_string" : {
        "query" : "entityName:(john AND Lewis OR \"john Lewis\") OR entityNameText:(john AND Lewis OR \"john Lewis\")"
    }
}

}

Query alternative 1:

{
"bool": {
    "should": [
        [{
            "bool": {
                "should": [
                    [{
                        "match": {
                            "entityName": {
                                "query": "john Lewis",
                                "operator": "and"
                            }
                        }
                    }, {
                        "match_phrase": {
                            "entityName": "john Lewis"
                        }
                    }]
                ]
            }
        }, {
            "bool": {
                "should": [
                    [{
                        "match": {
                            "entityNameText": {
                                "query": "john Lewis",
                                "operator": "and"
                            }
                        }
                    }, {
                        "match_phrase": {
                            "entityNameText": "john Lewis"
                        }
                    }]
                ]
            }
        }]
    ]
}
}

Query alternative 2

{
"bool": {
    "should": [
        [{
            "multi_match": {
                "query": "john Lewis",
                "type": "most_fields",
                "fields": ["entityName", "entityNameText"],
                "operator": "and"
            }
        }, {
            "multi_match": {
                "query": "john Lewis",
                "type": "phrase",
                "fields": ["entityName", "entityNameText"]
            }
        }]
    ]
}
}

With this mapping:

{
"entity": {
    "dynamic_templates": [{
        "catch_all": {
            "match_mapping_type": "*",
            "mapping": {
                "type": "text",
                "store": true,
                "analyzer": "phonetic_index",
                "search_analyzer": "phonetic_query"
            }
        }
    }],
    "_all": {
        "enabled": false
    },
    "properties": {
        "entityName": {
            "type": "text",
            "store": true,
            "analyzer": "indexed_index",
            "search_analyzer": "indexed_query",
            "fields": {
                "entityNameLower": {
                    "type": "text",
                    "analyzer": "lowercase"
                },
                "entityNameText": {
                    "type": "text",
                    "store": true,
                    "analyzer": "text_index",
                    "search_analyzer": "text_query"
                },
                "entityNameNgram": {
                    "type": "text",
                    "analyzer": "ngram_index",
                    "search_analyzer": "ngram_query"
                },
                "entityNamePhonetic": {
                    "type": "text",
                    "analyzer": "ngram_index",
                    "search_analyzer": "ngram_query"
                }
            }
        },
        "status": {
            "type": "keyword",
            "norms": false,
            "store": true
        }
    }
}
}

Upvotes: 2

jhilden
jhilden

Reputation: 12459

The answer will depend on how you've specified your mapping, but I'll assume that you did zero customer mapping.

Let's break down the different parts first, then we'll put them all back together.

status( "A" OR "I" status)

This is a "terms" query, think of it as a SQL "IN" clause.

  "terms": {
    "status": [
      "a",
      "i"
    ]
  }

entityName:(john AND lewis OR "john lewis")

ElasticSearch breaks down string fields into distinct parts. We can use this to our advantage here by using another "terms" query. we don't need to specify it as 3 different parts, ES will handle that under the hood.

"terms": {
              "entityName": [
                "john",
                "lewis"
              ]
            }

entityNameText:(john AND lewis OR "john lewis"))

Exactly the same logic as above, just searching on a different field

"terms": { "entityNameText": [ "john", "lewis" ] }

AND vs OR

In an ES query. And = "must" Or = "should".

Put it all together

GET test1/type1/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "terms": {
            "status": [
              "a",
              "i"
            ]
          }
        },
        {
          "bool": {
            "should": [
              {
                "terms": {
                  "entityName": [
                    "john",
                    "lewis"
                  ]
                }
              },
              {
                "terms": {
                  "entityNameText": [
                    "john",
                    "lewis"
                  ]
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Below is a link to the full setup I used to test the query.

https://gist.github.com/jayhilden/cf251cd751ef8dce7a57df1d03396778

Upvotes: 0

Related Questions