Reputation: 3
I have an Elasticsearch index that stores documents with the following fields:
I need to find 5 unique user_id values from the most recent and highest-priority documents. If I were working with SQL, I would solve this by:
SELECT DISTINCT user_id
FROM (
SELECT user_id
FROM documents
ORDER BY timestamp DESC, priority DESC
) AS sorted_docs
LIMIT 5;
However, I am unsure how to perform a similar operation in Elasticsearch. I have looked into using terms and composite aggregations, but I'm not clear on:
Can someone guide me on how to structure this query or provide an example? Any help or references would be greatly appreciated!
Upvotes: 0
Views: 17
Reputation: 5486
You have not shared any sample data but based on the information you have given, I have created sample data and provided sample query.
You can use Collapse functionality from Elasticsearch to get distinct value for user_id.
Let consider below is your index mapping:
PUT test
{
"mappings": {
"properties": {
"user_id":{
"type": "keyword"
},
"timestamp":{
"type": "date"
},
"priority":{
"type": "integer"
}
}
}
}
Below is sample data:
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "user_id" : "101", "timestamp" : "2024-11-05T12:10:30Z","priority" : "1"}
{ "index" : { "_index" : "test", "_id" : "2" } }
{ "user_id" : "101", "timestamp" : "2024-11-05T16:10:30Z","priority" : "2"}
{ "index" : { "_index" : "test", "_id" : "3" } }
{ "user_id" : "102", "timestamp" : "2024-11-04T12:10:30Z","priority" : "3"}
{ "index" : { "_index" : "test", "_id" : "4" } }
{ "user_id" : "102", "timestamp" : "2024-11-05T08:10:30Z","priority" : "4"}
{ "index" : { "_index" : "test", "_id" : "5" } }
{ "user_id" : "103", "timestamp" : "2024-11-05T12:10:30Z","priority" : "5"}
{ "index" : { "_index" : "test", "_id" : "6" } }
{ "user_id" : "103", "timestamp" : "2024-11-03T12:10:30Z","priority" : "6"}
{ "index" : { "_index" : "test", "_id" : "7" } }
{ "user_id" : "104", "timestamp" : "2024-11-05T12:10:30Z","priority" : "7"}
{ "index" : { "_index" : "test", "_id" : "8" } }
{ "user_id" : "105", "timestamp" : "2024-11-04T12:10:30Z","priority" : "8"}
{ "index" : { "_index" : "test", "_id" : "9" } }
{ "user_id" : "105", "timestamp" : "2024-11-01T12:10:30Z","priority" : "9"}
{ "index" : { "_index" : "test", "_id" : "10" } }
{ "user_id" : "106", "timestamp" : "2024-11-02T12:10:30Z","priority" : "10"}
You can use below query which will sort on timestamp and priority as well as collapse on user_id which will make sure you will get all the unique user ids:
POST test/_search
{
"size": 5,
"sort": [
{
"timestamp": {
"order": "desc"
}
},
{
"priority": {
"order": "desc"
}
}
],
"query": {
"match_all": {}
},
"collapse": {
"field": "user_id"
}
}
Below will be response for above query:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10,
"relation": "eq"
},
"max_score": null,
"hits": [
{
"_index": "test",
"_id": "2",
"_score": null,
"_source": {
"user_id": "101",
"timestamp": "2024-11-05T16:10:30Z",
"priority": "2"
},
"fields": {
"user_id": [
"101"
]
},
"sort": [
1730823030000,
2
]
},
{
"_index": "test",
"_id": "7",
"_score": null,
"_source": {
"user_id": "104",
"timestamp": "2024-11-05T12:10:30Z",
"priority": "7"
},
"fields": {
"user_id": [
"104"
]
},
"sort": [
1730808630000,
7
]
},
{
"_index": "test",
"_id": "5",
"_score": null,
"_source": {
"user_id": "103",
"timestamp": "2024-11-05T12:10:30Z",
"priority": "5"
},
"fields": {
"user_id": [
"103"
]
},
"sort": [
1730808630000,
5
]
},
{
"_index": "test",
"_id": "4",
"_score": null,
"_source": {
"user_id": "102",
"timestamp": "2024-11-05T08:10:30Z",
"priority": "4"
},
"fields": {
"user_id": [
"102"
]
},
"sort": [
1730794230000,
4
]
},
{
"_index": "test",
"_id": "8",
"_score": null,
"_source": {
"user_id": "105",
"timestamp": "2024-11-04T12:10:30Z",
"priority": "8"
},
"fields": {
"user_id": [
"105"
]
},
"sort": [
1730722230000,
8
]
}
]
}
}
I have consider 10 as highest priority as you didn't clarified in your question. If you want 1 as highest priority then change priority order to asc
from desc
in sort.
Upvotes: 0