Reputation: 6741
I want to store time-related data in ElasticSearch. Each document has a start time and an end time which define when it was relevant (an open ticket for example). Then I want to be able to run queries such as:
Since I'll have monthly indexes, documents spanning more than a month will be stored in more than one index.
For example, a ticket opened between January and April needs to be stored in the indexes of all 4 months' indexes.
I was wondering if there is an easy way to run later on aggregations across the indexes which will know to take each ticket only once into account (for faceting and such).
For example, if I have the following tickets:
We will have the following documents/indexes:
Index January:
{id: A, StartDate: 1/1/2021, EndDate: 17/1/2021}
{id: B, StartDate: 15/1/2021}
{id: C, StartDate: 12/1/2021}
Index February:
{id: B, StartDate: 15/1/2021, EndDate: 16/2/2021}
{id: C, StartDate: 12/1/2021}
Index March:
{id: C, StartDate: 12/1/2021, endDate: 16/3/2021}
Any inputs related to the best way to implement such a thing with ElasticSearch are most welcomed. If there are other preferred databases for the job with high scale and fast aggregations I'll be happy to hear on alternatives as well.
Upvotes: 1
Views: 583
Reputation: 16895
This'll be pretty straightforward if both of these conditions are fulfilled:
StartDate
value.EndDate
also has a StartDate
.So, if we set up an index template covering all our future monthly indices:
PUT _index_template/monthly-indices
{
"index_patterns": [
"index-2021-*"
],
"template": {
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"StartDate": {
"type": "date",
"format": "d/M/yyyy"
},
"EndDate": {
"type": "date",
"format": "d/M/yyyy"
}
}
}
}
}
we can then add the sample docs from your question:
POST index-2021-01/_doc/A
{
"id": "A",
"StartDate": "1/1/2021",
"EndDate": "17/1/2021"
}
POST index-2021-01/_doc/B
{
"id": "B",
"StartDate": "15/1/2021"
}
POST index-2021-02/_doc/B
{
"id": "B",
"StartDate": "15/1/2021",
"EndDate": "16/2/2021"
}
POST index-2021-02/_doc/C
{
"id": "C",
"StartDate": "12/2/2021"
}
POST index-2021-03/_doc/C
{
"id": "C",
"StartDate": "12/2/2021",
"EndDate": "16/3/2021"
}
After that, the question "Which tickets were open between 1/1/2021 and 31/3/2021?" can be then answered through:
POST index-2021-*/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"StartDate": {
"gte": "1/1/2021"
}
}
},
{
"range": {
"EndDate": {
"lte": "31/3/2021"
}
}
}
]
}
}
}
which returns the docs A
, B
, and C
-- but obviously only those entries that have both start & end dates.
Moving forward, a simple "facet" statistic denoting "how many tickets were open month-over-month" can be constructed using this scripted date_histogram
aggregation:
POST index-2021-*/_search
{
"size": 0,
"aggs": {
"open_count_by_months": {
"date_histogram": {
"interval": "month",
"format": "MMMM yyyy",
"script": """
def start = doc['StartDate'].value;
def end = doc['EndDate'].size() == 0 ? null : doc['EndDate'].value;
if (end == null) {
return start
}
return end
"""
}
}
}
}
yielding
"aggregations" : {
"open_count_by_months" : {
"buckets" : [
{
"key_as_string" : "January 2021",
"key" : 1609459200000,
"doc_count" : 2
},
{
"key_as_string" : "February 2021",
"key" : 1612137600000,
"doc_count" : 2
},
{
"key_as_string" : "March 2021",
"key" : 1614556800000,
"doc_count" : 1
}
]
}
}
Wrapping up, if you manage to keep solid control over the docs' content "spread" across multiple indices, you'll be OK.
If, however, you just let a cronjob sync the tickets every day and construct your docs such that they act like messages "announcing" the current ticket status, i.e.:
{
"ticket_id": A,
"status": "OPEN"
"timestamp": 1613428738570
}
it'll introduce some nasty complexity that I discussed a while back here and here.
Upvotes: 2