Optimized way to store nested data in Elasticsearch

I was wondering, what is the best way to store an object of data in Elastic, which looks something like:

physical_host_name: physicalOne
physical_host_cpu_model: x86_64
physical_host_cpu_num: 72
physical_host_mem_size: 792116312 KiB
physical_host_guests_list: 
{

    guestOne : 
    {
        guest_max_mem: 16384000 KiB
        guest_os_type: hvm
        guest_state: running
    }

    guestTwo : 
    {
        guest_max_mem: 11234000 KiB
        guest_os_type: hvm
        guest_state: paused
    }
}

I would like to be able to query by physical_host_name and get all relevant data to this host (including physical_host_guests_list), and also to be able to query by guest_name (for example 'guestOne') and get relevant data to this guest only.

Should this all be in one index? how should each document look like?

Upvotes: 0

Views: 147

Answers (1)

ibexit
ibexit

Reputation: 3667

Your design is a valid approach. The document itself will be pretty the same as your example, but JSON. Let us name this version "A". If any change occurs to the host or the guest, you'll need to find and then update the document. Cosider using the FQDN as document id. This will simplify the CRUD. But having a elastic generated id is ablosutley ok (and may be a better approach if your dataset is large enough for multiple shards as the id affects the distribution of docs over multiple shards).

I see also a version "B", where each guest is a single document and holds also a reference to the host (parent):

{ type:guest, name: vm_a, parent: physicalOne, cpu: {model: x86_64, num: 16} , mem: { total: 16384000 KiB} , os: {name: hvm, version: 10}, state: suspended }

And the host will have a own document like this:

{type: host, name: physicalOne, parent: null, cpu: {model: x86_64, num: 72} , mem: {total: 792116312 KiB}, os: {name: linux, version: 3}, state: running} 

If you need a order, for the guests or hosts, add a order field too, but I'll try to use a natural ordering using the host name etc. This will simplify addition/deletion a lot.

Going this way, you'll need only to update the document where the change occured, addition and deletion of new guest is simple. Please also note the nested properties (cpu, os, ram). This will clean up your dics and maybe a better design of the classes in a application consuming/managing this data. This design will also allow you to query across all guests or hosts. And the main advantage of this design is the same set of fields across host and guest documents and this will keep the footprint (ram/storage) of this index als low as possible.

This design (B) is my preferred one, but just one of many imanigable. At the end, the design should suport you in as many aspects but be optimized for something like speed/ease of reading, writing, analysis etc...

Upvotes: 2

Related Questions