Jordan Ell
Jordan Ell

Reputation: 1805

How to create a histogram in elasticsearch which takes into account date ranges

Problem

I have data in the following schema:

{
  start_date: '2017-01-01',
  end_date: '2017-01-05',
},
{
  start_date: '2017-01-03',
  end_date: '2017-01-07',
}

I am trying to create a histogram per day which would give me if a particular document had its start and end date date overlap on that day.

With the data about, the output buckets would be:

{
  "2017-01-01": { "doc_count": 1 },
  "2017-01-02": { "doc_count": 1 },
  "2017-01-03": { "doc_count": 2 },
  "2017-01-04": { "doc_count": 2 },
  "2017-01-05": { "doc_count": 2 },
  "2017-01-06": { "doc_count": 1 },
  "2017-01-07": { "doc_count": 1 }
}

After reading all of the elasticsearch aggregation docs, I do not see how this would be possible. Any help is appreciated.


Solution

Expanding on Olivier's answer below, I did the following:

Create a helper function to generate all inclusive days between a start and end date:

const generateDateRange = (start, end) => {
  const startDate = moment(start);
  const endDate = moment(end);

  const range = [];

  const date = startDate;
  while (date.isSameOrBefore(endDate)) {
    range.push(date.format('YYYY-MM-DD'));
    date.add(1, 'day');
  }

  return range;
};

Created a helper function to generate all filters needed for the aggregation based on the date range:

const generateActivityFilters = (range, options = {}) => {
  const filters = {};

  range.map((date) => {
    filters[date] = {
      bool: {
        filter: [
          { range: { [options.start]: { lte: date } } },
          { range: { [options.end]: { gte: date } } },
        ],
      },
    };
    return true;
  });

  return filters;
};

Finally, ran the query as follows:

{
  "size": 0, 
  "aggs": {
    "date_histo": {
      "filters": {
        "filters": filters // from generateActivityFilters
      }
    }
  }
}

The only alternative to this I see is potentially doing this whole operation in a script instead, but after a couple of hours of playing with elasticsearch scripts I gave up on that approach.

Upvotes: 4

Views: 536

Answers (1)

Olivier
Olivier

Reputation: 2681

I find the question quite interesting.

Personal search didn't bring a reasonable way of achieving this, one of the reason being how would you define the start and end dates for your date histogram (as it typically uses the field parameter for figuring this)?

Someone more advanced with usage of bucket and pipeline aggregations might be able to help out, but closest I got would be by "cheating" and building a filters aggregation to achieve the goal:

{
  "size": 0, 
  "aggs": {
    "date_histo": {
      "filters": {
        "filters": {
          "2017-01-01": {
            "bool": {
              "filter": [
                {"range": {"start_date": {"lte": "2017-01-01"}}},
                {"range": {"end_date": {"gte": "2017-01-01"}}}
              ]
            }
          },
          "2017-01-02": {
            "bool": {
              "filter": [
                {"range": {"start_date": {"lte": "2017-01-02"}}},
                {"range": {"end_date": {"gte": "2017-01-02"}}}
              ]
            }
          },
          ...
        }
      }
    }
  }
}

Not very pretty, but might still be worth considering as a starting point to a better answer.

Upvotes: 2

Related Questions