Ben Osborne
Ben Osborne

Reputation: 268

Elastic Search - How to join data between two types?

I am trying to explore elastic search for a project I am working on, but stuck with how to join two types of documents.

For example if I had 10 documents that are hotel availability rates, and 10 documents that were all flights to that destination that the hotel is in.

Usually in MySQL I would make a join based on the date, and duration of hotel and flight etc.

How would I return a single hotel document with the cheapest flight from the 10 that are available?

Upvotes: 0

Views: 1260

Answers (2)

Benjamin Trent
Benjamin Trent

Reputation: 7566

The closest thing I can think of to do what you want is Composite Aggregations. It is not a true join, but it COULD get you close to what you want.

Stipulations:

  • Fields have to have the same names between indexes
  • You will have to flatten out the resulting aggregation
  • All result fields (the ones you care about) will be aggregations of some sort

Here is a minimal example (hacked out in Kibana Console):

With the docs:

POST my-test1/_doc/_bulk
{"index": {}}
{"entityID":"entity1", "value": 12}
{"index": {}}
{"entityID":"entity1", "value": 22}
{"index": {}}
{"entityID":"entity2", "value": 2}
{"index": {}}
{"entityID":"entity2", "value": 12}


POST my-test2/_doc/_bulk
{"index": {}}
{"entityID":"entity1", "otherValue": 5}
{"index": {}}
{"entityID":"entity1", "otherValue": 1}
{"index": {}}
{"entityID":"entity2", "otherValue": 3}
{"index": {}}
{"entityID":"entity2", "otherValue": 7}

We will aggregate around the common entity field entityID

GET my-test*/_search
{
  "size": 0,
  "aggs": {
    "by-entity": {
      "composite": {
        "sources": [
          {
            "entityID": {
              "terms": {
                "field": "entityID.keyword"
              }
            }
          }
        ]
      },
      "aggs": {
        "value": {
          "avg": {
            "field": "value"
          }
        },
        "otherValue": {
          "avg": {
            "field": "otherValue"
          }
        }
      }
    }
  }
}

This will result in a response:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 10,
    "successful" : 10,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 8,
    "max_score" : 0.0,
    "hits" : [ ]
  },
  "aggregations" : {
    "by-entity" : {
      "after_key" : {
        "entityID" : "entity2"
      },
      "buckets" : [
        {
          "key" : {
            "entityID" : "entity1"
          },
          "doc_count" : 4,
          "otherValue" : {
            "value" : 3.0
          },
          "value" : {
            "value" : 17.0
          }
        },
        {
          "key" : {
            "entityID" : "entity2"
          },
          "doc_count" : 4,
          "otherValue" : {
            "value" : 5.0
          },
          "value" : {
            "value" : 7.0
          }
        }
      ]
    }
  }
}

You could create a composite aggregation around many different fields and different bucket aggregations. So, you could create a terms aggregation for your hotel_id and combine it with a date_histogram around your timestamp.

Upvotes: 2

Chris Heald
Chris Heald

Reputation: 62688

Elasticsearch doesn't have cross-index joins (like most document databases). If you have to do this in ES, you would typically do it by denormalizing the data at index time. If you can't do that, then you'll have to do multiple queries.

If you need to do actually relational query work, you're better off using a relational DB such as MySQL or Postgres.

Upvotes: 0

Related Questions