Jimmy
Jimmy

Reputation: 12487

Elasticsearch setting format for custom date

This is my date format:

10:00 2019-06-03

According to the Elasticsearch documents, I can do this:

{
  "mappings": {
    "properties": {
      "date": {
        "type":   "date",
        "format": "HH:mm yyyy-MM-dd"
      }
    }
  }
}

However, when I do this, it doesn't recognise this as a date (and therefore convert it to a timestamp. Does anyone understand why?

https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatter.html

Upvotes: 2

Views: 160

Answers (1)

Kamal Kunjapur
Kamal Kunjapur

Reputation: 8840

Let's say we have the below mapping for the date field you've in your question

PUT <your_index_name>
{  
   "mappings":{  
      "properties":{  
         "date":{  
            "type":"date",
            "format":"HH:mm yyyy-MM-dd||yyyy-MM-dd HH:mm"
         }
      }
   }
}

Notice how I've added the two different types of date formats

Let me add two documents now:

POST mydate/_doc/1
{
  "date": "10:00 2019-06-03"
}

POST mydate/_doc/2
{
  "date": "2019-06-03 10:00"
}

Notice the above two date values. Semantically they both mean exactly the same. This has to be preserved while querying.

Now if the user wants to search based on semantic meaning of what a date value should be, then he/she should get both the documents.

POST <your_index_name>/_search
{
  "query": {
    "match": {
      "date": "10:00 2019-06-03"
    }
  }
}

Response:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "mydate",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "date" : "10:00 2019-06-03"
        }
      },
      {
        "_index" : "mydate",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "date" : "2019-06-03 10:00"
        }
      }
    ]
  }
}

Which is what observed in the response. Both those documents are returned.

This would only be possible if the underlying mechanism to store the values are exactly the same. Inside the inverted index, both these values would be stored as the same long number.

Now if you remove that semantic definition, then both these values are no different than just simple strings, where you know, 10:00 2019-06-03 and 2019-06-03 10:00 are both different, and adhere to semantics of what a string should be (And if date performs like this, why have date datatype at all, correct).

What we specify as format in the mapping is how the date value should appear to the user.

Note the below info from this link:

Internally, dates are converted to UTC (if the time-zone is specified) and stored as a long number representing milliseconds-since-the-epoch.

Queries on dates are internally converted to range queries on this long representation, and the result of aggregations and stored fields is converted back to a string depending on the date format that is associated with the field.

Hope this helps!

Upvotes: 2

Related Questions