Valentin Popa
Valentin Popa

Reputation: 1

Why does elasticsearch return different results when using 'contains' and 'within' in a geoshape query where index shape and query shape are the same?

Consider the following:

I have one index that contains a geoshape field. The geoshape can represent anything, but as an example we can consider that the indexed shape is a rectangle:

We're creating the index with a geoshape field named location:

PUT /example
{
  "mappings": {
    "properties": {
      "location": {
        "type": "geo_shape"
      }
    }
  }
}

We index a document that contains a rectangle in the location field:

POST /example/_doc
{
  "location" : {
    "type": "polygon",
    "coordinates": [
      [
        [
          -1.738392029727578,
          52.3042810657775
        ],
        [
          -1.738392029727578,
          51.81371529802078
        ],
        [
          0.694240708108822,
          51.81371529802078
        ],
        [
          0.694240708108822,
          52.3042810657775
        ],
        [
          -1.738392029727578,
          52.3042810657775
        ]
      ]
    ]
  }
}

We search the index using a geoshape query with an inline shape definition (the specified relationship will be within):

GET /example/_search
{
  "query": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": {
        "geo_shape": {
          "location": {
            "shape": {
              "type": "polygon",
              "coordinates": [
                [
                  [
                    -1.738392029727578,
                    52.3042810657775
                  ],
                  [
                    -1.738392029727578,
                    51.81371529802078
                  ],
                  [
                    0.694240708108822,
                    51.81371529802078
                  ],
                  [
                    0.694240708108822,
                    52.3042810657775
                  ],
                  [
                    -1.738392029727578,
                    52.3042810657775
                  ]
                ]
              ]
            },
            "relation": "within"
          }
        }
      }
    }
  }
}

And the returned result is the following (the same shape present in the previously indexed document):

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "example",
        "_id": "EO3gAIgBN8DGPp9VobXP",
        "_score": 1,
        "_source": {
          "location": {
            "type": "polygon",
            "coordinates": [
              [
                [
                  -1.738392029727578,
                  52.3042810657775
                ],
                [
                  -1.738392029727578,
                  51.81371529802078
                ],
                [
                  0.694240708108822,
                  51.81371529802078
                ],
                [
                  0.694240708108822,
                  52.3042810657775
                ],
                [
                  -1.738392029727578,
                  52.3042810657775
                ]
              ]
            ]
          }
        }
      }
    ]
  }
}

If we are to change the spatial relationship from within to contains:

GET /example/_search
{
  "query": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": {
        "geo_shape": {
          "location": {
            "shape": {
              "type": "polygon",
              "coordinates": [
                [
                  [
                    -1.738392029727578,
                    52.3042810657775
                  ],
                  [
                    -1.738392029727578,
                    51.81371529802078
                  ],
                  [
                    0.694240708108822,
                    51.81371529802078
                  ],
                  [
                    0.694240708108822,
                    52.3042810657775
                  ],
                  [
                    -1.738392029727578,
                    52.3042810657775
                  ]
                ]
              ]
            },
            "relation": "contains"
          }
        }
      }
    }
  }
}

We're not getting back any result:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 0,
      "relation": "eq"
    },
    "max_score": null,
    "hits": []
  }
}

And the same goes if we're trying this approach with a pre-indexed shape:

GET /example/_search
{
  "query": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": {
        "geo_shape": {
          "location": {
            "indexed_shape": {
              "index": "example",
              "id": "EO3gAIgBN8DGPp9VobXP",
              "path": "location"
            },
            "relation": "within"
          }
        }
      }
    }
  }
}

I'll skip adding the result returned since it's the same as the one presented earlier.

From the Elasticsearch documentation:

WITHIN - Return all documents whose geo_shape or geo_point field is within the query geometry. Line geometries are not supported.

CONTAINS - Return all documents whose geo_shape or geo_point field contains the query geometry.

Since the geo_shape field contains the same information as the geometry that is passed as a query geometry (inline shape definition or pre-indexed shape) how come the query result is inconsistent? It should either return no results or return the same documents for both cases.

Obviously, tried checking various forums for a similar issue: github issues, https://discuss.elastic.co, stack overflow, etc.

I'll try to have a look at the source code, but I doubt I can understand something from there.

Upvotes: 0

Views: 268

Answers (1)

Josef Veselý
Josef Veselý

Reputation: 112

It's just a meter of ES interpretation of the two relationships:

  • within matches because Elasticsearch allows an object to be within itself.
  • contains does NOT match because Elasticsearch treats "contains" as a strict enclosure, requiring the query shape to be strictly inside, not just identical.

I.e.

  • Object A is within object B if it is completely enclosed by object B
  • Object A contains object B if it encloses it and there is still little extra space in Object A

Upvotes: 0

Related Questions