Marriam
Marriam

Reputation: 21

Error: failed to parse date field [2021-H1] with format [strict_date_optional_time||epoch_millis]

I am trying to fetch dates from elastic search using unconventional date formats. My field in elastic search accepts these dates by having "ignore_malformed": true but when I try to fetch these dates I get this error. My date formats are: YYYY-Q[1-4] , YYYY-H[1-2], YYYY-[EARLY|MID|LATE] Can anyone suggest how I can retrieve these dates/ranges?

Upvotes: 2

Views: 3013

Answers (1)

Joe - Check out my books
Joe - Check out my books

Reputation: 16895

A couple of things here. Before we start, let's clarify the terminology. Fetching is probably just fine — your issue has to do with ingesting (and subsequently parsing) your custom date fields.

Let's start with the most important point.

There's no standardized date format for year quarters, halves, or "seasons".

As discussed in the thread Is there a standard date format that expresses Quarters?, different companies divide their years differently.

That said, Elasticsearch will only be able to parse the year component of your date strings. In order to enforce the three custom formats from your question, you could define the following mapping:

PUT my-index/
{
  "mappings": {
    "properties": {
      "time_field": {
        "type": "date",
        "format": "uuuu-['Q'][1][2][3][4]||uuuu-['H1']['H2']||uuuu-['EARLY']['MID']['LATE']"
      }
    }
  }
}

BTW there's more on custom date formats here.

After that, you could attempt to ingest the three variants:

POST my-index/_doc
{
  "id": "2021-Q2",
  "time_field": "2021-Q2"
}

POST my-index/_doc
{
  "id": "2021-H1",
  "time_field": "2021-H1"
}

POST my-index/_doc
{
  "id": "2021-LATE",
  "time_field": "2021-LATE"
}

In order to verify how the dates were parsed internally, you could utilize script fields like so:

GET my-index/_search?filter_path=hits.hits._source.id,hits.hits.fields
{
  "_source": "id", 
  "script_fields": {
    "exploring_time_field": {
      "script": {
        "lang": "painless",
        "source": """
          LocalDateTime.ofInstant(
             Instant.ofEpochMilli(doc['time_field'].value.toInstant().toEpochMilli()),
             ZoneId.of('UTC')
          ).format(DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ss"))
        """
      }
    }
  }
}

which would return:

{
  "hits" : {
    "hits" : [
      {
        "_source" : {
          "id" : "2021-Q2"
        },
        "fields" : {
          "exploring_time_field" : [
            "2021-01-01 00:00:00"
          ]
        }
      },
      {
        "_source" : {
          "id" : "2021-H1"
        },
        "fields" : {
          "exploring_time_field" : [
            "2021-01-01 00:00:00"
          ]
        }
      },
      {
        "_source" : {
          "id" : "2021-LATE"
        },
        "fields" : {
          "exploring_time_field" : [
            "2021-01-01 00:00:00"
          ]
        }
      }
    ]
  }
}

As you can see, since only the year component was extracted, the rest of the date time components were assigned the default values for missing components (Jan 1st 00:00:00).

This is certainly not what you're after and so your best bet is to parse the custom date strings before you ingest them into Elasticsearch.

Upvotes: 1

Related Questions