Jay Rizzi
Jay Rizzi

Reputation: 4304

Elasticsearch indexed database table column structure

I have a question regarding the setup of my elasticsearch database index... I have created a table which I have rivered to index in elasticsearch. The table is built from a script that queries multiple tables to denormalize data making it easier to index by a unique id 1:1 ratio

An example of a set of fields I have is street, city, state, zip, which I can query on, but my question is , should I be keeping those fields individually indexed , or be concatenating them as one big field like address which contains all of the previous fields into one? Or be putting in the extra time to setup parent-child indexes?

The use case example is I have a customer with billing info coming from one direction, I want to query elasticsearch to see if that customer already exists, or at least return the closest result

I know this question is more conceptual than programming, I just can't find any information of best practices.

Upvotes: 3

Views: 684

Answers (1)

paweloque
paweloque

Reputation: 18864

Concatenation

For the first part of your question: I wouldn't concatenate the different fields into a field containing all information. Having multiple fields gives you the advantage of calculating facets and aggregates on those fields, e.g. how many customers are from a specific city or have a specific zip. You can still use a match or multimatch query to query for information from different fields.

In addition to having the information in separate fields I would use multifields with an analyzed and not_analyzed part (fieldname.raw). This again allows for aggregates, facets and sorting.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/mapping-multi-field-type.html

Think of 'New York': if you analyze it will be stored as ['New', 'York'] and you will not be able to see all People from 'New York'. What you'd see are all people from 'New' and 'York'.

_all field

There is a special _all field in elasticsearch which does the concatenation in the background. You don't have to do it yourself. It is possible to enable/disable it.

Parent Child relationship

Concerning the part whether to use nested objects or parent child relationship: I think that using a parent child relationship is more appropriate for your case. Nested objects are stored in a 'flattened' way, i.e. the information from the nested objects in arrays is stored as being part of one object. Consider the following example:

You have an order for a client:

client: 'Samuel Thomson'
    orderline: 'Strong Thinkpad'
    orderline: 'Light Macbook'
client: 'Jay Rizzi'
    orderline: 'Strong Macbook'

Using nested objects if you search for clients who ordered 'Strong Macbook' you'd get both clients. This because 'Samuel Thomson' and his orders are stored altogether, i.e. ['Strong' 'Thinkpad' 'Light' 'Macbook'], there is no distinction between the two orderlines.

By using parent child documents, the orderlines for the same client are not mixed and preserve their identity.

Upvotes: 3

Related Questions