Animesh Pandey
Animesh Pandey

Reputation: 6018

Multiple document types with same mapping in Elasticseach

I have index named test which can be associated to n number of documents types named sub_test_1 to sub_text_n. But all will have same mapping.

Is there any way to make an index such all document types have same mapping for their documents? I.e. test\sub_text1\_mapping should be same as test\sub_text2\_mapping.

Otherwise if I have like 1000 document types, I will we having 1000 mappings of the same type referring to each document types.

UPDATE:

PUT /test_index/
{
  "settings": {
    "index.store.type": "default",
    "index": {
        "number_of_shards": 5,
        "number_of_replicas": 1,
        "refresh_interval": "60s"
    },
    "analysis": {
        "filter": {
            "porter_stemmer_en_EN": {
                "type": "stemmer",
                "name": "porter"
            },
            "default_stop_name_en_EN": {
                "type": "stop",
                "name": "_english_"
            },
            "snowball_stop_words_en_EN": {
                "type": "stop",
                "stopwords_path": "snowball.stop"
            },
            "smart_stop_words_en_EN": {
                "type": "stop",
                "stopwords_path": "smart.stop"
            },
            "shingle_filter_en_EN": {
                "type": "shingle",
                "min_shingle_size": "2",
                "max_shingle_size": "2",
                "output_unigrams": true
            }
        }
    }
  }
}

Intended mapping:

{
  "sub_text" : {
    "properties" : {
      "_id" : {
        "include_in_all" : false,
        "type" : "string",
        "store" : true,
        "index" : "not_analyzed"
      },
      "alternate_id" : {
        "include_in_all" : false,
        "type" : "string",
        "store" : true,
        "index" : "not_analyzed"
      },
      "text" : {
        "type" : "multi_field",
        "fields" : {
          "text" : {
            "type" : "string",
            "store" : true,
            "index" : "analyzed",
          },
          "pdf": {
            "type" : "attachment",
            "fields" : {
                "pdf" : {
                    "type" : "string",
                    "store" : true,
                    "index" : "analyzed",
                }
            }
          }
        }
      }
    }
  }
}

I want this mapping to be an individual mapping for all sub_texts I create so that I can change it for one sub_text without affecting others e.g. I may want to add two custom analyzers to sub_text1 and three analyzers to sub_text3, rest others will stay same.

UPDATE:

PUT /my-index/document_set/_mapping
{
  "properties": {
    "type": {
      "type": "string",
      "index": "not_analyzed"
    },
    "doc_id": {
      "type": "string",
      "index": "not_analyzed"
    },
    "plain_text": {
      "type": "string",
      "store": true,
      "index": "analyzed"
    },
    "pdf_text": {
      "type": "attachment",
      "fields": {
        "pdf_text": {
          "type": "string",
          "store": true,
          "index": "analyzed"
        }
      }
    }
  }
}

POST /my-index/document_set/1
{
  "type": "d1",
  "doc_id": "1",
  "plain_text": "simple text for doc1."
}

POST /my-index/document_set/2
{
  "type": "d1",
  "doc_id": "2",
  "pdf_text": "cGRmIHRleHQgaXMgaGVyZS4="
}

POST /my-index/document_set/3
{
  "type": "d2",
  "doc_id": "3",
  "plain_text": "simple text for doc3 in d2."
}

POST /my-index/document_set/4
{
  "type": "d2",
  "doc_id": "4",
  "pdf_text": "cGRmIHRleHQgaXMgaGVyZSBpbiBkMi4="
}

GET /my-index/document_set/_search
{
  "query" : {
    "filtered" : {
      "filter" : {
        "term" : {
          "type" : "d1"
        }
      }
    }
  }
}

This gives me the documents related to type "d1". How to add analyzers only to document of type "d1"?

Upvotes: 0

Views: 2518

Answers (2)

pickypg
pickypg

Reputation: 22332

Do not do this.

Otherwise if I have like 1000 document types, I will we having 1000 mappings of the same type referring to each document types.

You're exactly right. For every additional _type with an identical mapping you are needlessly adding to the size of your index's mapping. They will not be merged, nor will any compression save you.

A much better solution is to simply create a shared _type and to create a field that represents the intended type. This completely avoids having wasted mappings and all of the negatives associated with it, including an unnecessary increase for your cluster state's size.

From there, you can imitate what Elasticsearch is doing for you and filter on your custom type without ballooning your mappings.

$ curl -XPUT localhost:9200/my-index -d '{
  "mappings" : {
    "my-type" : {
      "properties" : {
        "type" : {
          "type" : "string",
          "index" : "not_analyzed"
        },
        # ... whatever other mappings exist ...
      }
    }
  }
}'

Then, for any search against sub_text1 (etc.), then you can do a term (for one) or terms (for more than one) filter to imitate the _type filter that would happen for you.

$ curl -XGET localhost:9200/my-index/my-type/_search -d '{
  "query" : {
    "filtered" : {
      "filter" : {
        "term" : {
          "type" : "sub_text1"
        }
      }
    }
  }
}'

This is doing the same thing as the _type filter and you can create _aliases that contain the filter if you want to have the higher level search capability without exposing client-level logic to the filtering.

Upvotes: 0

Mr Hash
Mr Hash

Reputation: 685

At the moment a possible solution is to use index templates or dynamic mapping. However they do not allow wildcard type matching so you would have to use the _default_ root type to apply the mappings to all types in the index and thus it would be up to you to ensure that all your types can be applied to the same dynamic mapping. This template example may work for you:

curl -XPUT localhost:9200/_template/template_1 -d '
{
    "template" : "test",
    "mappings" : {
        "_default_" : {
            "dynamic": true,
            "properties": {
                "field1": {
                   "type": "string",
                   "index": "not_analyzed"
                }
            }
        }
    }
}
'

Upvotes: 1

Related Questions