Reputation: 1194
I have the following property in a class:
public DateTime InsertedTimeStamp { get; set; }
With the the following mapping in ES
"insertedTimeStamp ":{
"type":"date",
"format":"yyyy-MM-ddTHH:mm:ssZ"
},
I would like to run an aggregation to return all the data grouped by the 'Day of the Week', i.e. 'Monday', 'Tuesday'...etc
I understand I can use a 'script' in the aggregation call to do this, see here, however, from my understanding, using a script has a not insignificant performance impact if there are alot of documents (which is anticpated here, think analytics logging).
Is there a way I can map the property with 'sub properties'. I.e. with a string I can do:
"somestring":{
"type":"string",
"analyzer":"full_word",
"fields":{
"partial":{
"search_analyzer":"full_word",
"analyzer":"partial_word",
"type":"string"
},
"partial_back":{
"search_analyzer":"full_word",
"analyzer":"partial_word_back",
"type":"string"
},
"partial_middle":{
"search_analyzer":"full_word",
"analyzer":"partial_word_name",
"type":"string"
}
}
},
All with the single property in the class in the .net
code.
Can I do something similar to store the 'full date' and then the 'year' and 'month' and 'day' etc separately (some sort of 'script' at index time), or will I need to make more properties in the class and map them individually? Is this what Transform did? (which is now depreciated hence seeming to indicate I need separate fields...)
Upvotes: 3
Views: 2457
Reputation: 217344
It is definitely possible to do it at indexing time using a pattern_capture
token filter.
You'd first define a one analyzer + token filter combo per date parts and assign each to a sub-field of your date field. Each token filter will only capture the group it is interested in.
{
"settings": {
"analysis": {
"analyzer": {
"year_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"year"
]
},
"month_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"month"
]
},
"day_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"day"
]
},
"hour_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"hour"
]
},
"minute_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"minute"
]
},
"second_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"second"
]
}
},
"filter": {
"year": {
"type": "pattern_capture",
"preserve_original": false,
"patterns": [
"(\\d{4})-\\d{2}-\\d{2}[tT]\\d{2}:\\d{2}:\\d{2}[zZ]"
]
},
"month": {
"type": "pattern_capture",
"preserve_original": false,
"patterns": [
"\\d{4}-(\\d{2})-\\d{2}[tT]\\d{2}:\\d{2}:\\d{2}[zZ]"
]
},
"day": {
"type": "pattern_capture",
"preserve_original": false,
"patterns": [
"\\d{4}-\\d{2}-(\\d{2})[tT]\\d{2}:\\d{2}:\\d{2}[zZ]"
]
},
"hour": {
"type": "pattern_capture",
"preserve_original": false,
"patterns": [
"\\d{4}-\\d{2}-\\d{2}[tT](\\d{2}):\\d{2}:\\d{2}[zZ]"
]
},
"minute": {
"type": "pattern_capture",
"preserve_original": false,
"patterns": [
"\\d{4}-\\d{2}-\\d{2}[tT]\\d{2}:(\\d{2}):\\d{2}[zZ]"
]
},
"second": {
"type": "pattern_capture",
"preserve_original": false,
"patterns": [
"\\d{4}-\\d{2}-\\d{2}[tT]\\d{2}:\\d{2}:(\\d{2})[zZ]"
]
}
}
}
},
"mappings": {
"test": {
"properties": {
"date": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ssZ",
"fields": {
"year": {
"type": "string",
"analyzer": "year_analyzer"
},
"month": {
"type": "string",
"analyzer": "month_analyzer"
},
"day": {
"type": "string",
"analyzer": "day_analyzer"
},
"hour": {
"type": "string",
"analyzer": "hour_analyzer"
},
"minute": {
"type": "string",
"analyzer": "minute_analyzer"
},
"second": {
"type": "string",
"analyzer": "second_analyzer"
}
}
}
}
}
}
}
Then when you index a date such as 2016-01-22T10:01:23Z
, you'll get each of the date sub-fields populated with the relevant part, i.e.
date
: 2016-01-22T10:01:23Z
date.year
: 2016
date.month
: 01
date.day
: 22
date.hour
: 10
date.minute
: 01
date.second
: 23
You're then free to aggregate on any of those sub-fields to get what you want.
Upvotes: 6
Reputation: 12672
I think your only option seems to be scripted upsert which will allow you to run scripts
while indexing.
I created basic index like this
POST user_index
{
"mappings": {
"users": {
"properties": {
"timestamp": {
"type": "date",
"format" : "yyyy-MM-dd'T'HH:mm:ssZ"
},
"month":{
"type" : "string"
},
"day_of_week" : {
"type" : "string"
},
"name" : {
"type" : "string"
}
}
}
}
}
Then you should index your documents like this
POST user_index/users/111/_update/
{
"scripted_upsert": true,
"script": "ctx._source.month = DateTime.parse('2014-03-01T10:30:00').toString('MMMM');ctx._source.day_of_week = DateTime.parse('2014-03-01T10:30:00').dayOfWeek().getAsText()",
"upsert": {
"name": "Brad Smith",
"timestamp": "2014-03-01T10:30:00Z"
}
}
It will index document like this, More on datetime manipulation
{
"_index": "user_index",
"_type": "users",
"_id": "111",
"_score": 1,
"_source": {
"timestamp": "2014-03-01T10:30:00Z",
"day_of_week": "Saturday",
"name": "Brad Smith",
"month": "March"
}
}
Now you can perform aggregations
with ease. Also note that you would have to enable dynamic scripting for this, better would be to put the script in config/scripts
folder and pass timestamp
as params
. You also might want to put everything inside script only depending on your requirements.
Hope this helps!!
Upvotes: 2