Apostolos
Apostolos

Reputation: 8111

Elasticsearch data to more structured form

I have a two field sub aggregations: `['field1', 'field2']. Both fields are term aggregations. The way elasticsearch returns aggregations isn't very convenient with all those buckets and nesting and bucket nesting. I am having troubles on transforming elasticsearch results to list of dicts e.g

elasticsearch fake results:

'aggregations':{
    'field1':{
        'buckets':[
            {
                'key':'value1',
                'field2':{
                    'buckets':[
                        {
                            'key':'1.1.1.1',
                            'doc_count':15
                        },
                        {
                            'key': '2.2.2.2',
                            'doc_count': 12
                        }

                    ]

                }
            },
            {
                'key': 'value2',
                'field2': {
                    'buckets': [
                        {
                            'key': '3.3.3.3',
                            'doc_count': 15
                        },
                        {
                            'key': '4.4.4.4',
                            'doc_count': 12
                        }
                     ]
                 }

            },
            {
                'key': 'value3',
                'field2': {
                    'buckets': [
                        {
                            'key': '5.5.5.5',
                            'doc_count': 15
                        },
                        {
                            'key': '6.6.6.6',
                            'doc_count': 12
                        }
                     ]
                 }
            }
        ]
    }
}

I would like the result to be in the form of this:

[{'field1':'value1', 'field2':'1.1.1.1'}, 
 {'field1':'value1', 'field2':'2.2.2.2'},
 {'field1':'value2', 'field2':'3.3.3.3'},
 {'field1':'value2', 'field2':'4.4.4.4'},
 {'field1':'value3', 'field2':'5.5.5.5'},
 {'field1':'value3', 'field2':'6.6.6.6'} ]

like a normal database with rows and columns. The aggregation name must be the column name this is necessary. I have thought of using some tree representation of the data and then after creating the tree data structure with dfs create each row of the results. But need a place to start.

Upvotes: 0

Views: 101

Answers (1)

Val
Val

Reputation: 217514

If you load that JSON aggregation results into a dictionary (json.loads('{...}')), you can then iterate over it very simply in 3 lines of code:

fields = []
for bucket in agg['aggregations']['field1']['buckets']:
    for sub in bucket['field2']['buckets']:
        fields.append({'field1': bucket['key'], 'field2': sub['key']})

After running this, the field array will contain exactly what you need, i.e. (The JSON below has been obtained with json.dumps(fields))

[
  {
    "field2": "1.1.1.1",
    "field1": "value1"
  },
  {
    "field2": "2.2.2.2",
    "field1": "value1"
  },
  {
    "field2": "3.3.3.3",
    "field1": "value2"
  },
  {
    "field2": "4.4.4.4",
    "field1": "value2"
  },
  {
    "field2": "5.5.5.5",
    "field1": "value3"
  },
  {
    "field2": "6.6.6.6",
    "field1": "value3"
  }
]

Upvotes: 0

Related Questions