erPe
erPe

Reputation: 578

Querying nested JSON structures in AWS Athena

I got the following format of JSON document with nested structures

{
    "id": "p-1234-2132321-213213213-12312",
    "name": "athena to the rescue",
    "groups": [
        {
            "strategy_group": "anyOf",
            "conditions": [
                {
                    "strategy_conditions": "anyOf",
                    "entries": [
                        {
                            "c_key": "service",
                            "C_operation": "isOneOf",
                            "C_value": "mambo,bambo,jumbo"
                        },
                        {
                            "c_key": "hostname",
                            "C_operation": "is",
                            "C_value": "lols"
                        }
                    ]
                }
            ]
        }
    ],
    "tags": [
        "aaa",
        "bbb",
        "ccc"
    ]
}

I have created table in Athena to support it using the following

CREATE EXTERNAL TABLE IF NOT EXISTS filters ( id string, name string, tags array<string>, groups array<struct<
    strategy_group:string,
    conditions:array<struct<
        strategy_conditions:string,
        entries: array<struct<
            c_key:string,
            c_operation:string,
            c_value:string
        >>
    >>
>> ) row format serde 'org.openx.data.jsonserde.JsonSerDe' location 's3://filterios/policies/';

My goal at the moment is to query based on the conditions entries columns as well. I have tried some queries however sql language is not my biggest trade ;)

I got at the moment to this query which gives me entries

select cnds.entries from 
filters,
UNNEST(filters.groups) AS t(grps),
UNNEST(grps.conditions) AS t(cnds)

However since this is complex array it gives me some headeache what would be the proper way to query.

Any hints appreciated!

thanks R

Upvotes: 6

Views: 18031

Answers (2)

jbgorski
jbgorski

Reputation: 1939

I am not sure whether I understood your query well. Look at this example below, maybe it will be useful to you.

select 
    id, 
    name, 
    tags,
    grps.strategy_group,
    cnds.strategy_conditions,
    enes.c_key,
    enes.c_operation, 
    enes.c_value 
from 
    filters,
    UNNEST(filters.groups) AS t(grps),
    UNNEST(grps.conditions) AS t(cnds),
    UNNEST(cnds.entries) AS t(enes)
where 
    enes.c_key='service'

Upvotes: 8

user2117893
user2117893

Reputation: 111

Here is one example i recently worked with that may help:

My JSON:

{
"type": "FeatureCollection",
"features": [{
    "first": "raj",
    "geometry": {
        "type": "Point",
        "coordinates": [-117.06861096, 32.57889962]
    },
    "properties": "someprop"
}] 
}

Created external table :

CREATE EXTERNAL TABLE `jsondata`(
  `type` string COMMENT 'from deserializer', 
  `features` array<struct<type:string,geometry:struct<type:string,coordinates:array<string>>>> COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'org.openx.data.jsonserde.JsonSerDe' 
WITH SERDEPROPERTIES ( 
  'paths'='features,type') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://vicinitycheck/rawData/jsondata/'
TBLPROPERTIES (
  'classification'='json')

Query data :

SELECT type AS TypeEvent,
     features[1].geometry.coordinates AS FeatherType
FROM test_vicinitycheck.jsondata
WHERE type = 'FeatureCollection'

test_vicinitycheck - Is my database name in Athena
jsondata - table name in Athena

I documented some examples on my blog if it helps: http://weavetoconnect.com/aws-athena-and-nested-json/

Upvotes: 1

Related Questions