Parsing AWS ATHENA outputs

Question

Relatively new to Python here, coming from a node.js background, having quite a few issues parsing the output I get from get_query_results()

Documentation Link

I have been at this for some hours, i have tried iterating through the ['ResultSetMetadata']['ColumnInfo'] to grab the column names, but i don't know how to tie the ['ResultSet']['Data'] to these items so the code knows which name to apply to each dataValue.

I know i need to select the row headers then add the associated objects to those rows, but the logic on how to do such a thing in python escapes me.

I can see that the first column name always lines up with the first ['Data']['VarCharValue'] so I can get all the values in order, but if I loop through ['ResultSet']['Rows'] how do I isolate the first iteration as the column names to then populate with each other row?

Or is there a better way to do this?

Here is my json.dumps(ATHENAoutput)

{
  "ResultSet": {
    "Rows": [{
      "Data": [{
        "VarCharValue": "postcode"
      }, {
        "VarCharValue": "CountOf"
      }]
    }, {
      "Data": [{
        "VarCharValue": "1231"
      }, {
        "VarCharValue": "2"
      }]
    }, {
      "Data": [{
        "VarCharValue": "1166"
      }, {
        "VarCharValue": "2"
      }]
    }, {
      "Data": [{
        "VarCharValue": "3651"
      }, {
        "VarCharValue": "3"
      }]
    }, {
      "Data": [{
        "VarCharValue": "2171"
      }, {
        "VarCharValue": "2"
      }]
    }, {
      "Data": [{
        "VarCharValue": "4697"
      }, {
        "VarCharValue": "2"
      }]
    }, {
      "Data": [{
        "VarCharValue": "4450"
      }, {
        "VarCharValue": "2"
      }]
    }, {
      "Data": [{
        "VarCharValue": "4469"
      }, {
        "VarCharValue": "1"
      }]
    }],
      "ResultSetMetadata": {
        "ColumnInfo": [{
          "Scale": 0,
          "Name": "postcode",
          "Nullable": "UNKNOWN",
          "TableName": "",
          "Precision": 2147483647,
          "Label": "postcode",
          "CaseSensitive": true,
          "SchemaName": "",
          "Type": "varchar",
          "CatalogName": "hive"
        }, {
          "Scale": 0,
          "Name": "CountOf",
          "Nullable": "UNKNOWN",
          "TableName": "",
          "Precision": 19,
          "Label": "CountOf",
          "CaseSensitive": false,
          "SchemaName": "",
          "Type": "bigint",
          "CatalogName": "hive"
        }]
      }
  },
    "ResponseMetadata": {
      "RetryAttempts": 0,
        "HTTPStatusCode": 200,
          "RequestId": "18190e7c-901c-40b4-b6ef-10a5013b1a70",
            "HTTPHeaders": {
              "date": "Mon, 01 Oct 2018 04:51:14 GMT",
                "x-amzn-requestid": "18190e7c-901c-40b4-b6ef-10a5013b1a70",
                  "content-length": "1464",
                    "content-type": "application/x-amz-json-1.1",
                      "connection": "keep-alive"
            }
    }
}

My desired Result is a JSON Array like the following:

[{
  "postcode": "2171",
  "CountOf": "2"
}, {
  "postcode": "4697",
  "CountOf": "2"
}, {
  "postcode": "1166",
  "CountOf": "2"
},
 ...
]

G_M · Accepted Answer

>>> def get_var_char_values(d):
...     return [obj['VarCharValue'] for obj in d['Data']]
... 
... 
... header, *rows = input_data['ResultSet']['Rows']
... header = get_var_char_values(header)
... result = [dict(zip(header, get_var_char_values(row))) for row in rows]
>>> import json; print(json.dumps(result, indent=2))
[
  {
    "postcode": "4450",
    "CountOf": "2"
  },
  {
    "postcode": "1231",
    "CountOf": "2"
  },
  {
    "postcode": "4469",
    "CountOf": "1"
  },
  {
    "postcode": "3651",
    "CountOf": "3"
  },
  {
    "postcode": "1166",
    "CountOf": "2"
  },
  {
    "postcode": "4697",
    "CountOf": "2"
  },
  {
    "postcode": "2171",
    "CountOf": "2"
  }
]

Parsing AWS ATHENA outputs

Answers (2)

Related Questions