Radim
Radim

Reputation: 184

Denormalization vs Child/Parent & Nesting

We are designing Elastic Search model for events, their schedules and venues, where the events take place. The design is following:

model Example of queries we might need:

Find events, which are Concerts, between 1/7/2017 and 7/7/2017

Find artists who performs at London and the event is Theatre play

Find events, which are Movies and having Score > 70%

Find users, who attend event AwesomeEvent

Find venues, which locality is London and any event is planned in the future since today

I've read elastic doc and few articles like this and some stack questions. But still I'm not sure about our model, because it's very specific.

Examples of possible usage:

1) Using nested pattern

{
  "title": "Event",
  "body":  "This great event is going to be...",
  "Schedules": [ 
    {
      "name":    "Schedule 1",
      "start":   "7.7.2017",
      "end":     "8.7.2017"
    },
    {
      "name":    "Schedule 2",
      "start":   "10.7.2017",
      "end":   "11.7.2017"
    }
  ],
  "Performers": [ 
    {
      "name":    "Performer 1",
      "genre":   "Rock"
    },
    {
      "name":    "Performer 2",
      "genre":   "Pop"
    }
  ],
  ...
}

Pros:

  1. More flat model which should stick to "key:value" approach
  2. Entity carries all information by itself

Cons:

  1. Lot of redundant data
  2. More complex entities

2) Parent / Child relation between following entities (simplified)

{
  "title": "Event",
  "body":  "This great event is going to be...",
}

{
  "title": "Schedule",
  "start":   "7.7.2017",
  "end":     "8.7.2017"
}

{
  "name":    "Performer",
  "genre":   "Rock"
}

Pros:

  1. Avoiding to duplicate redundant data

Cons:

  1. More joins (even the parent/child are stored at same shard)
  2. The model is not that flat, I'm not sure about the performance

So far we have a relational database, where the model works fine but it's not fast enough. Especially for example when you imagine a cinema, one event(movie) can have a thousands of schedules in different localities and we want to achieve very fast response for filtering as I wrote at the first part.

I expect any suggestions leading to properly designing the data model. I will be also glad for reviewing my assumptions (probably some of them might be wrong).

Upvotes: 1

Views: 1408

Answers (1)

xProgramery
xProgramery

Reputation: 517

It's hard to denormalize your data. For example, the number of performers in an event is unknown; so if you were to have specific fields for performers, you would need perofrmer1.firstname, perofrmer1.lastname, performer2.firstname, performer2.lastname, etc. However if you use nested field instead, you would simply define a nested field Performer under event index with correct sub-field mappings, then you can add as many as you want to it. This will enable you to lookup event by performer or performer by event. The same apply to the rest of the indices.

As far as parent-child vs nested, parent-child provide more dependence as child documents reside on a completely separate index. Both parent-child and nested fields can specify "include_in_parent" option to automatically denormalize the fields for you

Upvotes: 1

Related Questions