Reputation: 31
We have a documents with a (simplified) structure as shown here in Elasticsearch:
{ _id: ..., patientId: 4711, text: "blue" }
{ _id: ..., patientId: 4711, text: "red" }
{ _id: ..., patientId: 4712, text: "blue" }
{ _id: ..., patientId: 4712, text: "green" }
{ ... }
How can I create a query to find all documents containing the text
blue
and red
within the SAME patient.
In the above example I would expect a result set of two documents with patientId 4711
(contains blue
and red
).
Potential solution strategies might be :
Are there better ways (ideal one query) to handle this use case?
Upvotes: 2
Views: 4551
Reputation: 496
(See Answer #1)
Another solution strategy is aggregation. Aggregation allows you to idenify document and then retrieve their contents with an additional query. However, basket keys already allow you to compose the contents of documents
Mapping (See Answer #1)
Documents (See Answer #1)
Update by query copies the _id metafield into the id document field. It needs for aggregation
POST /patient_text/_update_by_query
{
"query": {
"match_all": {
}
},
"script": {
"source": """
ctx._source.id = ctx['_id'];
"""
}
}
Aggregation query
GET /patient_text/_search?filter_path=aggregations
{
"query": {
"bool": {
"filter": [
{
"terms": {
"text": [
"blue",
"red"
]
}
}
]
}
},
"aggs": {
"per_patient_id": {
"terms": {
"field": "patient_id"
},
"aggs": {
"per_text_distinct_values": {
"terms": {
"field": "text"
},
"aggs": {
"per_id": {
"terms": {
"field": "id"
}
}
}
},
"text_distinct_values_filter": {
"bucket_selector": {
"buckets_path": {
"text_distinct_values": "per_text_distinct_values._bucket_count"
},
"script": "params.text_distinct_values == 2"
}
}
}
}
}
}
Response
{
"aggregations" : {
"per_patient_id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "4711",
"doc_count" : 2,
"per_text_distinct_values" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "blue",
"doc_count" : 1,
"per_id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "1",
"doc_count" : 1
}
]
}
},
{
"key" : "red",
"doc_count" : 1,
"per_id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "2",
"doc_count" : 1
}
]
}
}
]
}
},
{
"key" : "4713",
"doc_count" : 2,
"per_text_distinct_values" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "blue",
"doc_count" : 1,
"per_id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "5",
"doc_count" : 1
}
]
}
},
{
"key" : "red",
"doc_count" : 1,
"per_id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "6",
"doc_count" : 1
}
]
}
}
]
}
}
]
}
}
}
Documents with id 1, 2, 5, 6 are in the response. Their contents can be recovered with help of bucket keys.
Upvotes: 0
Reputation: 496
As @Prabin Meitei wrote one solution strategy is to use collapsing search results. All you have to do is parse the search results and extract document identifiers and fields
Mapping
PUT /patient_text
{
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"patient_id": {
"type": "keyword"
},
"text": {
"type": "keyword"
}
}
}
}
Documents based on original ones
PUT /patient_text/_bulk
{"create":{"_id":1}}
{"patient_id": 4711, "text": "blue" }
{"create":{"_id":2}}
{"patient_id": 4711, "text": "red" }
{"create":{"_id":3}}
{"patient_id": 4712, "text": "blue" }
{"create":{"_id":4}}
{"patient_id": 4712, "text": "blue" }
{"create":{"_id":5}}
{"patient_id": 4713, "text": "blue" }
{"create":{"_id":6}}
{"patient_id": 4713, "text": "red" }
{"create":{"_id":7}}
{"patient_id": 4714, "text": "green" }
{"create":{"_id":8}}
{"patient_id": 4714, "text": "black" }
Search query with collapsing results
GET /patient_text/_search?filter_path=hits.hits.fields,hits.hits.inner_hits.texts.hits.hits.fields
{
"query": {
"terms": {
"text": [
"blue",
"red"
]
}
},
"collapse": {
"field": "patient_id",
"inner_hits": {
"name": "texts",
"fields": [
"text"
],
"_source": false
}
},
"_source": false
}
Response
{
"hits" : {
"hits" : [
{
"fields" : {
"patient_id" : [
"4711"
]
},
"inner_hits" : {
"texts" : {
"hits" : {
"hits" : [
{
"fields" : {
"text" : [
"blue"
]
}
},
{
"fields" : {
"text" : [
"red"
]
}
}
]
}
}
}
},
{
"fields" : {
"patient_id" : [
"4712"
]
},
"inner_hits" : {
"texts" : {
"hits" : {
"hits" : [
{
"fields" : {
"text" : [
"blue"
]
}
},
{
"fields" : {
"text" : [
"blue"
]
}
}
]
}
}
}
},
{
"fields" : {
"patient_id" : [
"4713"
]
},
"inner_hits" : {
"texts" : {
"hits" : {
"hits" : [
{
"fields" : {
"text" : [
"blue"
]
}
},
{
"fields" : {
"text" : [
"red"
]
}
}
]
}
}
}
}
]
}
}
Upvotes: 0
Reputation: 394
How about changing the way you store data into elastisearch. Just store one document for one patient id, and keep text as array of all distinct colors assigned to that patient.
Upvotes: 1
Reputation: 2000
You can simply use bool query or bool filter
Example using bool filter
{
"filtered" : {
"query" : {
"match_all" : { }
},
"filter" : {
"bool" : {
"Must" : [
{
"term" : { "text" : "blue" }
},
{
"term" : { "text" : "red" }
}
]
}
}
}
}
Edit: misread the requirement:
You should be using field collapsing
Upvotes: 0