erbdex
erbdex

Reputation: 1909

Elasticsearch default mapping

My current understanding-

  1. Elasticsearch creates the mapping indices the first time it receives the JSON datasets.
  2. This mapping cannot be changed, but the datasets can be re-mapped.

Question-

Forget re-mapping. Is there any way to tell ES to behave by default as-

"Consider everything that is not a date to be of string type"?

Also, will i be losing out on much if i do this?

Update-

i added the file- config/mappings/_default/mapping.json with the following contents-

{
    "dynamic_templates": [
        {
            "template_1": {
                "match": "*",
                "match_mapping_type": "int",
                "mapping": {
                    "type": "string"
                }
            },
            "template_2": {
                "match": "*",
                "match_mapping_type": "long",
                "mapping": {
                    "type": "string"
                }
            }
        }
    ]
}

i also tried placing the following at- config/default_mapping.json

{
    "_default_" : {
        "match": "*",
        "match_mapping_type": "int",
        "mapping": {
                "type": "string"
        }
    }
}

My 'motive' is to get rid of errors that crop up if int and long types change to string. Will this map all int and long values as string across all indexes that are created in the future? Do i need to nest this dynamic_templates key within _all?

Update II-

Adding this mapping file causes elasticsearch to cough up-

[2014-02-04 10:48:34,396][DEBUG][action.admin.indices.create] [Her] [logstash-2014.02.04] failed to create
org.elasticsearch.index.mapper.MapperParsingException: mapping [mapping.json]
    at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$2.execute(MetaDataCreateIndexService.java:312)
    at org.elasticsearch.cluster.service.InternalClusterService$UpdateTask.run(InternalClusterService.java:298)
    at org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:135)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:701)
Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be cast to java.util.Map
    at org.elasticsearch.index.mapper.DocumentMapperParser.extractMapping(DocumentMapperParser.java:268)
    at org.elasticsearch.index.mapper.DocumentMapperParser.parse(DocumentMapperParser.java:155)
    at org.elasticsearch.index.mapper.MapperService.parse(MapperService.java:314)
    at org.elasticsearch.index.mapper.MapperService.merge(MapperService.java:193)
    at org.elasticsearch.cluster.metadata.MetaDataCreateIndexService$2.execute(MetaDataCreateIndexService.java:309)
    ... 5 more
2014-02-04 10:48:34 +0000 [warn]: temporarily failed to flush the buffer. next_retry=2014-02-04 10:48:33 +0000 error_class="Net::HTTPServerException" error="400 \"Bad Request\"" instance=17509700

Upvotes: 3

Views: 16072

Answers (1)

javanna
javanna

Reputation: 60205

When you start from scratch, thus without mapping, you rely on defaults. Every time you send a document the fields that weren't mapped yet are automatically mapped based on their json type (and conventions for dates). That said, if you send a field in your first document as a number and that same field becomes a string in your second document, the index operation for the second document will return an error.

There are apis to manage mappings, which doesn't mean that you have to declare all your fields. You can just specify the ones that you want to behave differently from the default. You can specify mappings while creating an index, using the put mapping api if the index already exists, or even include them in index templates, for indices that have yet to be created.

Changing the mappings is possible, but only backwards compatible changes can be applied. You can always add new fields, but you can't change the type or the analyzer for an existing field. What you could do in that case is trying to make the change backwards compatible by using multi-fields, otherwise you need to reindex against the updated mappings.

As for your last question, if you index everything as a string, you lose what you can usually do with numbers e.g. range queries. Whether this is feasible or not depends on your data and what you need to do with it.

Upvotes: 7

Related Questions