Robin Rieger
Robin Rieger

Reputation: 1194

Elasticsearch - DateTime mapping for 'Day of Week'

I have the following property in a class:

public DateTime InsertedTimeStamp { get; set; }

With the the following mapping in ES

"insertedTimeStamp ":{
    "type":"date",
    "format":"yyyy-MM-ddTHH:mm:ssZ"
},

I would like to run an aggregation to return all the data grouped by the 'Day of the Week', i.e. 'Monday', 'Tuesday'...etc

I understand I can use a 'script' in the aggregation call to do this, see here, however, from my understanding, using a script has a not insignificant performance impact if there are alot of documents (which is anticpated here, think analytics logging).

Is there a way I can map the property with 'sub properties'. I.e. with a string I can do:

"somestring":{
    "type":"string",
    "analyzer":"full_word",
    "fields":{
        "partial":{
            "search_analyzer":"full_word",
            "analyzer":"partial_word",
            "type":"string"
        },
        "partial_back":{
            "search_analyzer":"full_word",
            "analyzer":"partial_word_back",
            "type":"string"
        },
        "partial_middle":{
            "search_analyzer":"full_word",
            "analyzer":"partial_word_name",
            "type":"string"
        }
    }
},

All with the single property in the class in the .net code.

Can I do something similar to store the 'full date' and then the 'year' and 'month' and 'day' etc separately (some sort of 'script' at index time), or will I need to make more properties in the class and map them individually? Is this what Transform did? (which is now depreciated hence seeming to indicate I need separate fields...)

Upvotes: 3

Views: 2457

Answers (2)

Val
Val

Reputation: 217344

It is definitely possible to do it at indexing time using a pattern_capture token filter.

You'd first define a one analyzer + token filter combo per date parts and assign each to a sub-field of your date field. Each token filter will only capture the group it is interested in.

{
  "settings": {
    "analysis": {
      "analyzer": {
        "year_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "year"
          ]
        },
        "month_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "month"
          ]
        },
        "day_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "day"
          ]
        },
        "hour_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "hour"
          ]
        },
        "minute_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "minute"
          ]
        },
        "second_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "second"
          ]
        }
      },
      "filter": {
        "year": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": [
            "(\\d{4})-\\d{2}-\\d{2}[tT]\\d{2}:\\d{2}:\\d{2}[zZ]"
          ]
        },
        "month": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": [
            "\\d{4}-(\\d{2})-\\d{2}[tT]\\d{2}:\\d{2}:\\d{2}[zZ]"
          ]
        },
        "day": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": [
            "\\d{4}-\\d{2}-(\\d{2})[tT]\\d{2}:\\d{2}:\\d{2}[zZ]"
          ]
        },
        "hour": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": [
            "\\d{4}-\\d{2}-\\d{2}[tT](\\d{2}):\\d{2}:\\d{2}[zZ]"
          ]
        },
        "minute": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": [
            "\\d{4}-\\d{2}-\\d{2}[tT]\\d{2}:(\\d{2}):\\d{2}[zZ]"
          ]
        },
        "second": {
          "type": "pattern_capture",
          "preserve_original": false,
          "patterns": [
            "\\d{4}-\\d{2}-\\d{2}[tT]\\d{2}:\\d{2}:(\\d{2})[zZ]"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "date": {
          "type": "date",
          "format": "yyyy-MM-dd'T'HH:mm:ssZ",
          "fields": {
            "year": {
              "type": "string",
              "analyzer": "year_analyzer"
            },
            "month": {
              "type": "string",
              "analyzer": "month_analyzer"
            },
            "day": {
              "type": "string",
              "analyzer": "day_analyzer"
            },
            "hour": {
              "type": "string",
              "analyzer": "hour_analyzer"
            },
            "minute": {
              "type": "string",
              "analyzer": "minute_analyzer"
            },
            "second": {
              "type": "string",
              "analyzer": "second_analyzer"
            }
          }
        }
      }
    }
  }
}

Then when you index a date such as 2016-01-22T10:01:23Z, you'll get each of the date sub-fields populated with the relevant part, i.e.

  • date: 2016-01-22T10:01:23Z
  • date.year: 2016
  • date.month: 01
  • date.day: 22
  • date.hour: 10
  • date.minute: 01
  • date.second: 23

You're then free to aggregate on any of those sub-fields to get what you want.

Upvotes: 6

ChintanShah25
ChintanShah25

Reputation: 12672

I think your only option seems to be scripted upsert which will allow you to run scripts while indexing.

I created basic index like this

POST user_index
{
  "mappings": {
    "users": {
      "properties": {
        "timestamp": {
          "type": "date",
          "format" : "yyyy-MM-dd'T'HH:mm:ssZ"
        },
        "month":{
          "type" : "string"
        },
        "day_of_week" : {
          "type" : "string"
        },
        "name" : {
          "type" : "string"
        }
      }
    }
  }
}

Then you should index your documents like this

POST user_index/users/111/_update/
{
  "scripted_upsert": true,
  "script": "ctx._source.month = DateTime.parse('2014-03-01T10:30:00').toString('MMMM');ctx._source.day_of_week = DateTime.parse('2014-03-01T10:30:00').dayOfWeek().getAsText()",
  "upsert": {
    "name": "Brad Smith",
    "timestamp": "2014-03-01T10:30:00Z"
  }
}

It will index document like this, More on datetime manipulation

 {
     "_index": "user_index",
     "_type": "users",
     "_id": "111",
     "_score": 1,
     "_source": {
         "timestamp": "2014-03-01T10:30:00Z",
         "day_of_week": "Saturday",
         "name": "Brad Smith",
         "month": "March"
     }
 }

Now you can perform aggregations with ease. Also note that you would have to enable dynamic scripting for this, better would be to put the script in config/scripts folder and pass timestamp as params. You also might want to put everything inside script only depending on your requirements.

Hope this helps!!

Upvotes: 2

Related Questions