AbtPst
AbtPst

Reputation: 8018

Elasticsearch : Default template does not detect date

I have a default template in place which looks like

PUT /_template/abtemp
{
    "template": "abt*",
  "settings": {
    "index.refresh_interval": "5s",
    "number_of_shards": 5,
    "number_of_replicas": 1,
    "index.codec": "best_compression"
  },
  "mappings": {
    "_default_": {
      "_all": {
        "enabled": false
      },
      "_source": {
        "enabled": true
      },
      "dynamic_templates": [
        {
          "message_field": {
            "match": "message",
            "match_mapping_type": "string",
            "mapping": {
              "type": "string",
              "index": "analyzed",
              "omit_norms": true,
              "fielddata": {
                "format": "disabled"
              }
            }
          }
        },
        {
          "string_fields": {
            "match": "*",
            "match_mapping_type": "string",
            "mapping": {
              "type": "string",
              "index": "analyzed",
              "omit_norms": true,
              "fielddata": {
                "format": "disabled"
              },
              "fields": {
                "raw": {
                  "type": "string",
                  "index": "not_analyzed",
                  "ignore_above": 256
                }
              }
            }
          }
        }
      ]
    }
  }
}

the idea here is this

  1. apply the template for all indices whose name matches abt*
  2. Only analyze a string field if it is named message. All other string fields will be not_analyzed and will have a corresponding .raw field

now i try to index some data into this as

curl -s -XPOST hostName:port/indexName/_bulk --data-binary @myFile.json

and here is the file

{ "index" : { "_index" : "abtclm3","_type" : "test"} }
{   "FIELD1":1,   "FIELD2":"2015-11-18 15:32:18"",   "FIELD3":"MATTHEWS",   "FIELD4":"GARY",   "FIELD5":"",   "FIELD6":"STARMX",   "FIELD7":"AL",   "FIELD8":"05/15/2010 11:30",   "FIELD9":"05/19/2010 7:00",   "FIELD10":"05/19/2010 23:00",   "FIELD11":3275,   "FIELD12":"LC",   "FIELD13":"WIN",   "FIELD14":"05/15/2010 11:30",   "FIELD15":"LC",   "FIELD16":"POTUS",   "FIELD17":"WH",   "FIELD18":"S GROUNDS",   "FIELD19":"OFFICE",   "FIELD20":"VISITORS",   "FIELD21":"STATE ARRIVAL - MEXICO**",   "FIELD22":"08/27/2010 07:00:00 AM +0000",   "FIELD23":"MATTHEWS",   "FIELD24":"GARY",   "FIELD25":"",   "FIELD26":"STARMX",   "FIELD27":"AL",   "FIELD28":"05/15/2010 11:30",   "FIELD29":"05/19/2010 7:00",   "FIELD30":"05/19/2010 23:00",   "FIELD31":3275,   "FIELD32":"LC",   "FIELD33":"WIN",   "FIELD34":"05/15/2010 11:30",   "FIELD35":"LC",   "FIELD36":"POTUS",   "FIELD37":"WH",   "FIELD38":"S GROUNDS",   "FIELD39":"OFFICE",   "FIELD40":"VISITORS",   "FIELD41":"STATE ARRIVAL - MEXICO**",   "FIELD42":"08/27/2010 07:00:00 AM +0000" }

note that there are a few fields, such as FIELD2 that should be classified as a date. Also, FIELD31 should be classified as long. So the indexing happens and when i look at the data i see that the numbers have been correctly classified but everything else has been put under string. How do i make sure that the fields that have timestamps get classified as dates?

Upvotes: 0

Views: 688

Answers (1)

Andrei Stefan
Andrei Stefan

Reputation: 52366

You have a lot of date formats there. You need a template like this one:

{
  "template": "abt*",
  "settings": {
    "index.refresh_interval": "5s",
    "number_of_shards": 5,
    "number_of_replicas": 1,
    "index.codec": "best_compression"
  },
  "mappings": {
    "_default_": {
      "dynamic_date_formats":["dateOptionalTime||yyyy-mm-dd HH:mm:ss||mm/dd/yyyy HH:mm||mm/dd/yyyy HH:mm:ss aa ZZ"],
      "_all": {
        "enabled": false
      },
      "_source": {
        "enabled": true
      },
      "dynamic_templates": [
        {
          "message_field": {
            "match": "message",
            "match_mapping_type": "string",
            "mapping": {
              "type": "string",
              "index": "analyzed",
              "omit_norms": true,
              "fielddata": {
                "format": "disabled"
              }
            }
          }
        },
        {
          "dates": {
            "match": "*",
            "match_mapping_type": "date",
            "mapping": {
              "type": "date",
              "format": "dateOptionalTime||yyyy-mm-dd HH:mm:ss||mm/dd/yyyy HH:mm||mm/dd/yyyy HH:mm:ss aa ZZ"
            }
          }
        },
        {
          "string_fields": {
            "match": "*",
            "match_mapping_type": "string",
            "mapping": {
              "type": "string",
              "index": "analyzed",
              "omit_norms": true,
              "fielddata": {
                "format": "disabled"
              },
              "fields": {
                "raw": {
                  "type": "string",
                  "index": "not_analyzed",
                  "ignore_above": 256
                }
              }
            }
          }
        }
      ]
    }
  }
}

This probably doesn't cover all the formats you have in there, you need to add the remaining ones. The idea is to specify them under dynamic_date_formats separated by || and then to specify them, also, under the format field for the date field itself.

To get an idea on what you need to do to define them, please see this section of the documentation for builtin formats and this piece of documentation for any custom formats you'd plan on using.

Upvotes: 1

Related Questions