bornTalented
bornTalented

Reputation: 625

Alternative of too many fields in elasticsearch document

I am using elasticsearch 5 version. They have set a limit on maximum no of fields in an index.

The maximum number of fields in an index. The default value is 1000.

I have a _type named 'customer' in the _index 'company', customer can have too many fileds (say 2000) in its doc.

Solution1:

We can achieve above requirement by changing in company setting as below

PUT company/_settings
{
  "index.mapping.total_fields.limit": 2000
}

and then put the mapping of customer like:

PUT /company/_mapping/customer
{
   "customer": {
      "properties": {
         "property1": {
            "type": "text"
         },
         "property2": {
            "type": "text"
         },
         .
         .
         .
         "property2000": {
            "type": "text"
         }
      }
   }
}

Problem:

Above solution leads to data sparsity problem, since each customer doesn't have all the properties.

Solution2:

We can create a separate _type for customer properties(say custom_props) with following mapping

PUT /company/_mapping/custom_props
{
   "custom_props": {
      "_parent": {
         "type": "customer"
      },
      "_routing": {
         "required": true
      },
      "properties": {
         "property_name": {
            "type": "text"
         },
         "property_value": {
            "type": "text"
         }
      }
   }
}

Now each property of customer will have a separate doc in custom_props.

Problem:

When searching for a particular customer with certain properties we need to make has_child query and some time has_child query with inner_hits in some use cases. As per ES documentation these queries are much slower than simple search queries.

So I want a best alternative of solving this problem when we have too many fields in our elasticsearch _index.

Upvotes: 1

Views: 2572

Answers (1)

Nikolay Vasiliev
Nikolay Vasiliev

Reputation: 6066

There is one type of handling relations in Elasticsearch that you didn't consider: nested objects. They are similar to parent/child but usually has better query performance.

With nested data type the mapping might look like this:

PUT /company/
{
  "mappings": {
    "customer": {
      "properties": {
        "commonProperty": {
          "type": "text"
        },
        "customerSpecific": {
          "type": "nested",
          "properties": {
            "property_name": {
              "type": "keyword"
            },
            "property_value": {
              "type": "text"
            }
          }
        }
      }
    }
  }
}

Let's see how will a document look like:

POST /company/customer/1
{
  "commonProperty": "companyID1",
  "customerSpecific": [
    {
      "property_name": "name",
      "property_value": "John Doe"
    },
    {
      "property_name": "address",
      "property_value": "Simple Rd, 112"
    }
  ]
}

POST /company/customer/2
{
  "commonProperty": "companyID1",
  "customerSpecific": [
    {
      "property_name": "name",
      "property_value": "Jane Adams"
    },
    {
      "property_name": "address",
      "property_value": "42 St., 15"
    }
  ]
}

To be able to query such data we will have to use a nested query. For instance, to find a customer with name "John" we might use a query like this:

POST /company/customer/_search
{
  "query": {
    "nested": {
      "path": "customerSpecific",
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "customerSpecific.property_name": "name"
              }
            },
            {
              "match": {
                "customerSpecific.property_value": "John"
              }
            }
          ]
        }
      }
    }
  }
}

Hope that helps!

Upvotes: 2

Related Questions