Mapping with thousands of fields design options

Question

I am trying to design a mapping for a general purpose list of "terms" that have a label and value as this:

terms = [
  { label: "Start Date", value: "2017/12/11" }, <- this is a date
  { label: "End Date", value: "2027/12/11" }, 
  { label: "Owner", value: "Monsters INC." }, <- this is text
  { label: "Fees", value: "1000$" } <- this is a numeric field
]

while all documents will share several common fields, I have several different document templates and users will be able to add custom terms to the list with different data types.

I need to query documents using some boolean logic like "get those documents where the start date is last year and fees are less than 1000$ and owner is "monster INC."

I have a quite big list of terms (thousands) and several more can be added by users or are added by the development team.

I have explored two solutions to this problem:

Storing as a nested object:

the mapping looks as so:

"terms":

                {
                    "type": "nested",
                    "properties": {
                        "label": { "type": "string" },
                        "value": { "type": "string" },
                        "source": { "type": "string" },
                        "page": { "type": "string" }
                    }
                }

Pros: No need to remake the index when new terms are added, smaller mapping

Cons:

Queries are harder since we need to check what the label is related to the value.

Since all values are strings there is no way to use lt, gt

It might be possible implement lt, gt using casting BUT it seemsslow (defeats the purpose of ES)

Creating a big mapping:

just create a big object with every single possible term:

{
   "Start Date": { "type": "date" },
   "End Date": { "type": "date" },
   "Owner": { "type": "text" },
   "Fees": { "type": "integer" },
    ... add as many terms as needed
}

Pros: queries become straightforward, can do gt, lt, can apply any needed optimization to each fields (like exact fields, keywords fields, etc)

Cons: big, esparce mappings are not recommended by ES since every document shares the same underlying data structure.

More work keeping the term list updated

Terms with the same name might clash if they have different data types

Is there any solution to this pattern offered by ES? Any help appreciated.

WE ARE CURRENTLY USING ES 5.5 There are currently 1400 terms in the term dictionary

Mapping with thousands of fields design options

Answers (1)

Related Questions