Elasticsearch - DateTime mapping for 'Day of Week'

Question

I have the following property in a class:

public DateTime InsertedTimeStamp { get; set; }

With the the following mapping in ES

"insertedTimeStamp ":{
    "type":"date",
    "format":"yyyy-MM-ddTHH:mm:ssZ"
},

I would like to run an aggregation to return all the data grouped by the 'Day of the Week', i.e. 'Monday', 'Tuesday'...etc

I understand I can use a 'script' in the aggregation call to do this, see here, however, from my understanding, using a script has a not insignificant performance impact if there are alot of documents (which is anticpated here, think analytics logging).

Is there a way I can map the property with 'sub properties'. I.e. with a string I can do:

"somestring":{
    "type":"string",
    "analyzer":"full_word",
    "fields":{
        "partial":{
            "search_analyzer":"full_word",
            "analyzer":"partial_word",
            "type":"string"
        },
        "partial_back":{
            "search_analyzer":"full_word",
            "analyzer":"partial_word_back",
            "type":"string"
        },
        "partial_middle":{
            "search_analyzer":"full_word",
            "analyzer":"partial_word_name",
            "type":"string"
        }
    }
},

All with the single property in the class in the .net code.

Can I do something similar to store the 'full date' and then the 'year' and 'month' and 'day' etc separately (some sort of 'script' at index time), or will I need to make more properties in the class and map them individually? Is this what Transform did? (which is now depreciated hence seeming to indicate I need separate fields...)

Val · Accepted Answer

It is definitely possible to do it at indexing time using a pattern_capture token filter.

You'd first define a one analyzer + token filter combo per date parts and assign each to a sub-field of your date field. Each token filter will only capture the group it is interested in.

{
  "settings": {
    "analysis": {
      "analyzer": {
        "year_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "year"
          ]
        },
        "month_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "month"
          ]
        },
        "day_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "day"
          ]
        },
        "hour_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "hour"
          ]
        },
        "minute_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "minute"
          ]
        },
        "second_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "second"
          ]
        }
      },
      "filter": {
        "year": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": [
            "(\d{4})-\d{2}-\d{2}[tT]\d{2}:\d{2}:\d{2}[zZ]"
          ]
        },
        "month": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": [
            "\d{4}-(\d{2})-\d{2}[tT]\d{2}:\d{2}:\d{2}[zZ]"
          ]
        },
        "day": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": [
            "\d{4}-\d{2}-(\d{2})[tT]\d{2}:\d{2}:\d{2}[zZ]"
          ]
        },
        "hour": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": [
            "\d{4}-\d{2}-\d{2}[tT](\d{2}):\d{2}:\d{2}[zZ]"
          ]
        },
        "minute": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": [
            "\d{4}-\d{2}-\d{2}[tT]\d{2}:(\d{2}):\d{2}[zZ]"
          ]
        },
        "second": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": [
            "\d{4}-\d{2}-\d{2}[tT]\d{2}:\d{2}:(\d{2})[zZ]"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "date": {
          "type": "date",
          "format": "yyyy-MM-dd'T'HH:mm:ssZ",
          "fields": {
            "year": {
              "type": "string",
              "analyzer": "year_analyzer"
            },
            "month": {
              "type": "string",
              "analyzer": "month_analyzer"
            },
            "day": {
              "type": "string",
              "analyzer": "day_analyzer"
            },
            "hour": {
              "type": "string",
              "analyzer": "hour_analyzer"
            },
            "minute": {
              "type": "string",
              "analyzer": "minute_analyzer"
            },
            "second": {
              "type": "string",
              "analyzer": "second_analyzer"
            }
          }
        }
      }
    }
  }
}

Then when you index a date such as 2016-01-22T10:01:23Z, you'll get each of the date sub-fields populated with the relevant part, i.e.

date: 2016-01-22T10:01:23Z
date.year: 2016
date.month: 01
date.day: 22
date.hour: 10
date.minute: 01
date.second: 23

You're then free to aggregate on any of those sub-fields to get what you want.

Elasticsearch - DateTime mapping for 'Day of Week'

Answers (2)

Related Questions

Elasticsearch - DateTime mapping for &#39;Day of Week&#39;

Answers (2)

Related Questions

Elasticsearch - DateTime mapping for 'Day of Week'