JustCode
JustCode

Reputation: 61

Filter data by day range in Elasticsearch using Python DSL

I wrote this below method to filter data for last 8 days

def method_one(query) -> Query:
    gte = (datetime.datetime.now() - datetime.timedelta(days=query)).date()
    lt = (datetime.datetime.now()).date()
    print(gte, lt)
    return Q(MultiMatch(
        query=filter("range", {"lastModifiedDate": {"gte": gte, "lt": lt}}
                        ),
        fields=['lastModifiedDate']
    ))

I want to filter data based on the lastModifiedDate field by forming an Elasticsearch Query object in Python. For example if I give /lastModifiedDate=8 (Rest API Call), it should return data by filtering for the last 8 days.

Upvotes: 0

Views: 2351

Answers (1)

Joe - Check out my books
Joe - Check out my books

Reputation: 16925

You don't need the datetime module to construct date queries in Elasticsearch -- you can use built-in date math:

from json import dumps
from elasticsearch_dsl.search import Search
from elasticsearch_dsl.query import Q, MultiMatch


def date_range_query(num_of_days):
    if not isinstance(num_of_days, int):
        raise Exception(
            'expected numeric & positive `num_of_days`, got `%s`' % str(num_of_days))

    return Q(
        "range",
        lastModifiedDate={
            "gte": "now-%dd" % num_of_days,
            "lt": "now"
        }
    )


try:
    q = date_range_query(8)
    print(dumps(q.to_dict(), indent=2))
except Exception as e:
    print(e)

which'd print

{
  "range": {
    "lastModifiedDate": {
      "gte": "now-8d",
      "lt": "now"
    }
  }
}

Alternatively, if you insisted on using datetime.date objects, you'd need to stringify the dates first. Now, when you do that with str(...), you're in fact calling .__str()__ which then calls .isoformat() and returns a string formatted as YYYY-MM-DD.

Now, your lastModifiedDate field's mapping may have a different format. As such, it's good practice to declare the format of your range query:

gte = (datetime.datetime.now() - datetime.timedelta(days=num_of_days)).date()
lt = (datetime.datetime.now()).date()

return Q(
    "range",
    lastModifiedDate={
        "gte": str(gte),
        "lt": str(lt),
        "format": "yyyy-MM-dd"  # keep in mind that the format in ES conforms to Java syntax, not python
    }
)

which'd produce a similar query but with concrete, spelled-out dates:

{
  "range": {
    "lastModifiedDate": {
      "gte": "2021-02-26",
      "lt": "2021-03-06",
      "format": "yyyy-MM-dd"
    }
  }
}

Upvotes: 4

Related Questions