WoJ
WoJ

Reputation: 29997

What are the performance drawbacks of flat documents vs. nested ones?

I have data which naturally fit into documents like

{
  "name": "Multi G. Enre",
  "books": [
    {
      "name": "Guns and lasers",
      "genre": "scifi",
      "publisher": "orbit"
    },
    {
      "name": "Dead in the night",
      "genre": "thriller",
      "publisher": "penguin"
    }
  ]
}

(the example is taken from a good review of nested and has_child documents)

In order to analyze them in Kibana and other software (a mix of legacy and lazyness), they are flattened:

{
  "name": "Multi G. Enre",
  "book_name": "Guns and lasers",
  "book_genre": "scifi",
  "book_publisher": "orbit"
}
{
  "name": "Multi G. Enre",
  "book_name": "Dead in the night",
  "book_genre": "thriller",
  "book_publisher": "penguin"
}

Beside the obvious growth of the size of the index, is there generally a performance impact of querying such flat records (the queries are of the type "writer with scifi books from penguin") versus nested ones, versus parent/child ones?

Upvotes: 5

Views: 2128

Answers (1)

jhilden
jhilden

Reputation: 12429

Querying the flat index will be much, MUCH better! The whole idea behind noSQL databases is to denormalize your data.

In your first example notice that you would need to update that record each time you add a book. That is a big no-no in ES/noSQL. ES records should be immutable. Behind the scenes updates are really delete+insert which is very expensive.

Upvotes: 7

Related Questions