Jim G.
Jim G.

Reputation: 15365

ElasticSearch and NEST - How do I construct a simple OR query?

I'm developing a building repository query.

Here is the query that I am trying to write.

(Exact match on zipCode) AND ((Case-insensitive exact match on address1) OR (Case-insensitive exact match on siteName))

In my repository, I have a document that looks like the following:

address1: 4 Myrtle Street
siteName: Myrtle Street
zipCode: 90210

And I keep getting matches on:

address1: 45 Myrtle Street
siteName: Myrtle
zipCode: 90210

Here are some attempts that have not worked:

{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "term": {
                  "address1": {
                    "value": "45 myrtle street"
                  }
                }
              },
              {
                "term": {
                  "siteName": {
                    "value": "myrtle"
                  }
                }
              }
            ]
          }
        },
        {
          "term": {
            "zipCode": {
              "value": "90210"
            }
          }
        }
      ]
    }
  }
}


{
  "query": {
    "filtered": {
      "query": {
        "term": {
          "zipCode": {
            "value": "90210"
          }
        }
      },
      "filter": {
        "or": {
          "filters": [
            {
              "term": {
                "address1": "45 myrtle street"
              }
            },
            {
              "term": {
                "siteName": "myrtle"
              }
            }
          ]
        }
      }
    }
  }
}




{
  "filter": {
    "bool": {
      "must": [
        {
          "or": {
            "filters": [
              {
                "term": {
                  "address1": "45 myrtle street"
                }
              },
              {
                "term": {
                  "siteName": "myrtle"
                }
              }
            ]
          }
        },
        {
          "term": {
            "zipCode": "90210"
          }
        }
      ]
    }
  }
}




{
  "query": {
    "bool": {
      "must": [
        {
          "span_or": {
            "clauses": [
              {
                "span_term": {
                  "siteName": {
                    "value": "myrtle"
                  }
                }
              }
            ]
          }
        },
        {
          "term": {
            "zipCode": {
              "value": "90210"
            }
          }
        }
      ]
    }
  }
}



{
  "query": {
    "filtered": {
      "query": {
        "term": {
          "zipCode": {
            "value": "90210"
          }
        }
      },
      "filter": {
        "or": {
          "filters": [
            {
              "term": {
                "address1": "45 myrtle street"
              }
            },
            {
              "term": {
                "siteName": "myrtle"
              }
            }
          ]
        }
      }
    }
  }
}

Does anyone know what I am doing wrong?

I'm writing this with NEST, so I would prefer NEST syntax, but ElasticSearch syntax would certainly suffice as well.

EDIT: Per @Greg Marzouka's comment, here are the mappings:

{
   [indexname]: {
      "mappings": {
         "[indexname]elasticsearchresponse": {
            "properties": {
               "address": {
                  "type": "string"
               },
               "address1": {
                  "type": "string"
               },
               "address2": {
                  "type": "string"
               },
               "address3": {
                  "type": "string"
               },
               "city": {
                  "type": "string"
               },
               "country": {
                  "type": "string"
               },
               "id": {
                  "type": "string"
               },
               "originalSourceId": {
                  "type": "string"
               },
               "placeId": {
                  "type": "string"
               },
               "siteName": {
                  "type": "string"
               },
               "siteType": {
                  "type": "string"
               },
               "state": {
                  "type": "string"
               },
               "systemId": {
                  "type": "long"
               },
               "zipCode": {
                  "type": "string"
               }
            }
         }
      }
   }
}

Upvotes: 1

Views: 494

Answers (1)

Greg Marzouka
Greg Marzouka

Reputation: 3325

Based on your mapping, you won't be able to search for exact matches on siteName because it's being analyzed with the standard analyzer, which is more tuned for full text search. This is the default analyzer that is applied by Elasticsearch when one isn't explicitly defined on a field.

The standard analyzer is breaking up the value of siteName into multiple tokens. For example, Myrtle Street is tokenized and stored as two separate terms in your index, myrtle and street, which is why your query is matching that document. For a case-insensitive exact match, instead you want Myrtle Street stored as a single, lower-cased token in your index: myrtle street.

You could set siteName to not_analyzed, which won't subject the field to the analysis chain at all- meaning the values will not be modified. However, this will produce a single Myrtle Street token, which will work for exact matches, but will be case-sensitive.

What you need to do is create a custom analyzer using the keyword tokenizer and lowercase token filter, then apply it to your field.

Here's how you can accomplish this with NEST's fluent API:

// Create the custom analyzer using the keyword tokenizer and lowercase token filter
var myAnalyzer = new CustomAnalyzer
{
    Tokenizer = "keyword",
    Filter = new [] { "lowercase" }
};

var response = this.Client.CreateIndex("your-index-name", c => c
    // Add the customer analyzer to your index settings
    .Analysis(an => an
        .Analyzers(az => az
            .Add("my_analyzer", myAnalyzer)
        )
    )
    // Create the mapping for your type and apply "my_analyzer" to the siteName field
    .AddMapping<YourType>(m => m
        .MapFromAttributes()
        .Properties(ps => ps
            .String(s => s.Name(t => t.SiteName).Analyzer("my_analyzer"))
        )
    )
);

Upvotes: 3

Related Questions