Reputation: 31

Join/Merge Elasticsearch results

We have a documents with a (simplified) structure as shown here in Elasticsearch:

{ _id: ..., patientId: 4711, text: "blue" } 
{ _id: ..., patientId: 4711, text: "red" }
{ _id: ..., patientId: 4712, text: "blue" } 
{ _id: ..., patientId: 4712, text: "green" } 
{ ... }

How can I create a query to find all documents containing the text blue and red within the SAME patient.

In the above example I would expect a result set of two documents with patientId 4711 (contains blue and red).

Potential solution strategies might be :

Run two queries and "join" results afterward by application logic.
Run separate queries based on prior list of patients. Only feasible if number of potential patients are small.

Are there better ways (ideal one query) to handle this use case?

Upvotes: 2

Answers (4)

G0l0s

Reputation: 496

Answer #2. Aggregation

(See Answer #1)

Another solution strategy is aggregation. Aggregation allows you to idenify document and then retrieve their contents with an additional query. However, basket keys already allow you to compose the contents of documents

Mapping (See Answer #1)

Documents (See Answer #1)

Update by query copies the _id metafield into the id document field. It needs for aggregation

POST /patient_text/_update_by_query
{
    "query": {
        "match_all": {
        }
    },
    "script": {
        "source": """
            ctx._source.id = ctx['_id'];
        """
    }
}

Aggregation query

GET /patient_text/_search?filter_path=aggregations
{
    "query": {
        "bool": {
            "filter": [
                {
                    "terms": {
                        "text": [
                            "blue",
                            "red"
                        ]
                    }
                }
            ]
        }
    },
    "aggs": {
        "per_patient_id": {
            "terms": {
                "field": "patient_id"
            },
            "aggs": {
                "per_text_distinct_values": {
                    "terms": {
                        "field": "text"
                    },
                    "aggs": {
                        "per_id": {
                            "terms": {
                                "field": "id"
                            }
                        }
                    }
                },
                "text_distinct_values_filter": {
                    "bucket_selector": {
                        "buckets_path": {
                            "text_distinct_values": "per_text_distinct_values._bucket_count"
                        },
                        "script": "params.text_distinct_values == 2"
                    }
                }
            }
        }
    }
}

Response

{
    "aggregations" : {
        "per_patient_id" : {
            "doc_count_error_upper_bound" : 0,
            "sum_other_doc_count" : 0,
            "buckets" : [
                {
                    "key" : "4711",
                    "doc_count" : 2,
                    "per_text_distinct_values" : {
                        "doc_count_error_upper_bound" : 0,
                        "sum_other_doc_count" : 0,
                        "buckets" : [
                            {
                                "key" : "blue",
                                "doc_count" : 1,
                                "per_id" : {
                                    "doc_count_error_upper_bound" : 0,
                                    "sum_other_doc_count" : 0,
                                    "buckets" : [
                                        {
                                            "key" : "1",
                                            "doc_count" : 1
                                        }
                                    ]
                                }
                            },
                            {
                                "key" : "red",
                                "doc_count" : 1,
                                "per_id" : {
                                    "doc_count_error_upper_bound" : 0,
                                    "sum_other_doc_count" : 0,
                                    "buckets" : [
                                        {
                                            "key" : "2",
                                            "doc_count" : 1
                                        }
                                    ]
                                }
                            }
                        ]
                    }
                },
                {
                    "key" : "4713",
                    "doc_count" : 2,
                    "per_text_distinct_values" : {
                        "doc_count_error_upper_bound" : 0,
                        "sum_other_doc_count" : 0,
                        "buckets" : [
                            {
                                "key" : "blue",
                                "doc_count" : 1,
                                "per_id" : {
                                    "doc_count_error_upper_bound" : 0,
                                    "sum_other_doc_count" : 0,
                                    "buckets" : [
                                        {
                                            "key" : "5",
                                            "doc_count" : 1
                                        }
                                    ]
                                }
                            },
                            {
                                "key" : "red",
                                "doc_count" : 1,
                                "per_id" : {
                                    "doc_count_error_upper_bound" : 0,
                                    "sum_other_doc_count" : 0,
                                    "buckets" : [
                                        {
                                            "key" : "6",
                                            "doc_count" : 1
                                        }
                                    ]
                                }
                            }
                        ]
                    }
                }
            ]
        }
    }
}

Documents with id 1, 2, 5, 6 are in the response. Their contents can be recovered with help of bucket keys.

Upvotes: 0

G0l0s

Reputation: 496

Answer #1. Collapsing Search Results

As @Prabin Meitei wrote one solution strategy is to use collapsing search results. All you have to do is parse the search results and extract document identifiers and fields

Mapping

PUT /patient_text
{
    "mappings": {
        "properties": {
            "id": {
                "type": "keyword"
            },
            "patient_id": {
                "type": "keyword"
            },
            "text": {
                "type": "keyword"
            }
        }
    }
}

Documents based on original ones

PUT /patient_text/_bulk
{"create":{"_id":1}}
{"patient_id": 4711, "text": "blue" } 
{"create":{"_id":2}}
{"patient_id": 4711, "text": "red" }
{"create":{"_id":3}}
{"patient_id": 4712, "text": "blue" } 
{"create":{"_id":4}}
{"patient_id": 4712, "text": "blue" }
{"create":{"_id":5}}
{"patient_id": 4713, "text": "blue" } 
{"create":{"_id":6}}
{"patient_id": 4713, "text": "red" }
{"create":{"_id":7}}
{"patient_id": 4714, "text": "green" } 
{"create":{"_id":8}}
{"patient_id": 4714, "text": "black" }

Search query with collapsing results

GET /patient_text/_search?filter_path=hits.hits.fields,hits.hits.inner_hits.texts.hits.hits.fields
{
    "query": {
        "terms": {
            "text": [
                "blue",
                "red"
            ]
        }
    },
    "collapse": {
        "field": "patient_id",
        "inner_hits": {
            "name": "texts",
            "fields": [
                "text"
            ],
            "_source": false
        }
    },
    "_source": false
}

Response

{
    "hits" : {
        "hits" : [
            {
                "fields" : {
                    "patient_id" : [
                        "4711"
                    ]
                },
                "inner_hits" : {
                    "texts" : {
                        "hits" : {
                            "hits" : [
                                {
                                    "fields" : {
                                        "text" : [
                                            "blue"
                                        ]
                                    }
                                },
                                {
                                    "fields" : {
                                        "text" : [
                                            "red"
                                        ]
                                    }
                                }
                            ]
                        }
                    }
                }
            },
            {
                "fields" : {
                    "patient_id" : [
                        "4712"
                    ]
                },
                "inner_hits" : {
                    "texts" : {
                        "hits" : {
                            "hits" : [
                                {
                                    "fields" : {
                                        "text" : [
                                            "blue"
                                        ]
                                    }
                                },
                                {
                                    "fields" : {
                                        "text" : [
                                            "blue"
                                        ]
                                    }
                                }
                            ]
                        }
                    }
                }
            },
            {
                "fields" : {
                    "patient_id" : [
                        "4713"
                    ]
                },
                "inner_hits" : {
                    "texts" : {
                        "hits" : {
                            "hits" : [
                                {
                                    "fields" : {
                                        "text" : [
                                            "blue"
                                        ]
                                    }
                                },
                                {
                                    "fields" : {
                                        "text" : [
                                            "red"
                                        ]
                                    }
                                }
                            ]
                        }
                    }
                }
            }
        ]
    }
}

Upvotes: 0

Harbeer Kadian

Reputation: 394

How about changing the way you store data into elastisearch. Just store one document for one patient id, and keep text as array of all distinct colors assigned to that patient.

Upvotes: 1

Prabin Meitei

Reputation: 2000

You can simply use bool query or bool filter

Example using bool filter

{
    "filtered" : {
        "query" : {
            "match_all" : { }
        },
        "filter" : {
            "bool" : {


                "Must" : [
                    {
                        "term" : { "text" : "blue" }
                    },
                    {
                        "term" : { "text" : "red" }
                    }
                ]
            }
        }
    }
}

Edit: misread the requirement:

You should be using field collapsing

Upvotes: 0

Join/Merge Elasticsearch results

Answers (4)

Answer #2. Aggregation

Answer #1. Collapsing Search Results

Related Questions