Reputation: 578
I got the following format of JSON document with nested structures
{
"id": "p-1234-2132321-213213213-12312",
"name": "athena to the rescue",
"groups": [
{
"strategy_group": "anyOf",
"conditions": [
{
"strategy_conditions": "anyOf",
"entries": [
{
"c_key": "service",
"C_operation": "isOneOf",
"C_value": "mambo,bambo,jumbo"
},
{
"c_key": "hostname",
"C_operation": "is",
"C_value": "lols"
}
]
}
]
}
],
"tags": [
"aaa",
"bbb",
"ccc"
]
}
I have created table in Athena to support it using the following
CREATE EXTERNAL TABLE IF NOT EXISTS filters ( id string, name string, tags array<string>, groups array<struct<
strategy_group:string,
conditions:array<struct<
strategy_conditions:string,
entries: array<struct<
c_key:string,
c_operation:string,
c_value:string
>>
>>
>> ) row format serde 'org.openx.data.jsonserde.JsonSerDe' location 's3://filterios/policies/';
My goal at the moment is to query based on the conditions entries columns as well. I have tried some queries however sql language is not my biggest trade ;)
I got at the moment to this query which gives me entries
select cnds.entries from
filters,
UNNEST(filters.groups) AS t(grps),
UNNEST(grps.conditions) AS t(cnds)
However since this is complex array it gives me some headeache what would be the proper way to query.
Any hints appreciated!
thanks R
Upvotes: 6
Views: 18031
Reputation: 1939
I am not sure whether I understood your query well. Look at this example below, maybe it will be useful to you.
select
id,
name,
tags,
grps.strategy_group,
cnds.strategy_conditions,
enes.c_key,
enes.c_operation,
enes.c_value
from
filters,
UNNEST(filters.groups) AS t(grps),
UNNEST(grps.conditions) AS t(cnds),
UNNEST(cnds.entries) AS t(enes)
where
enes.c_key='service'
Upvotes: 8
Reputation: 111
Here is one example i recently worked with that may help:
My JSON:
{
"type": "FeatureCollection",
"features": [{
"first": "raj",
"geometry": {
"type": "Point",
"coordinates": [-117.06861096, 32.57889962]
},
"properties": "someprop"
}]
}
Created external table :
CREATE EXTERNAL TABLE `jsondata`(
`type` string COMMENT 'from deserializer',
`features` array<struct<type:string,geometry:struct<type:string,coordinates:array<string>>>> COMMENT 'from deserializer')
ROW FORMAT SERDE
'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
'paths'='features,type')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://vicinitycheck/rawData/jsondata/'
TBLPROPERTIES (
'classification'='json')
Query data :
SELECT type AS TypeEvent,
features[1].geometry.coordinates AS FeatherType
FROM test_vicinitycheck.jsondata
WHERE type = 'FeatureCollection'
test_vicinitycheck - Is my database name in Athena
jsondata - table name in Athena
I documented some examples on my blog if it helps: http://weavetoconnect.com/aws-athena-and-nested-json/
Upvotes: 1