pdolinaj
pdolinaj

Reputation: 1145

AWS Athena - Querying JSON - Searching for Values

I have nested JSON files on S3 and am trying to query them with Athena.

However, I am having problems to query the nested JSON values.

My JSON file looks like this:

 {
  "id": "17842007980192959",
  "acount_id": "17841401243773780",
  "stats": [
    {
      "name": "engagement",
      "period": "lifetime",
      "values": [
        {
          "value": 374
        }
      ],
      "title": "Engagement",
      "description": "Total number of likes and comments on the media object",
      "id": "17842007980192959/insights/engagement/lifetime"
    },
    {
      "name": "impressions",
      "period": "lifetime",
      "values": [
        {
          "value": 11125
        }
      ],
      "title": "Impressions",
      "description": "Total number of times the media object has been seen",
      "id": "17842007980192959/insights/impressions/lifetime"
    },
    {
      "name": "reach",
      "period": "lifetime",
      "values": [
        {
          "value": 8223
        }
      ],
      "title": "Reach",
      "description": "Total number of unique accounts that have seen the media object",
      "id": "17842007980192959/insights/reach/lifetime"
    },
    {
      "name": "saved",
      "period": "lifetime",
      "values": [
        {
          "value": 0
        }
      ],
      "title": "Saved",
      "description": "Total number of unique accounts that have saved the media object",
      "id": "17842007980192959/insights/saved/lifetime"
    }
  ],
  "import_date": "2017-12-04"
}

What I'm trying to do is to query the "stats" field value where name=impressions.

So ideally something like:

SELECT id, account_id, stats.values.value WHERE stats.name='engagement'

AWS example: https://docs.aws.amazon.com/athena/latest/ug/searching-for-values.html

Any help would be appreciated.

Upvotes: 5

Views: 10622

Answers (1)

jens walter
jens walter

Reputation: 14029

You can query the JSON with the following table definition:

CREATE EXTERNAL TABLE test(
id string,
acount_id string,
stats array<
  struct<
     name:string,
     period:string,
     values:array<
         struct<value:string>>,
     title:string
  >
 >
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://bucket/';

Now, the value column is available through the following unnesting:

select id, acount_id, stat.name,x.value
from test
cross join UNNEST(test.stats) as st(stat)
cross join UNNEST(stat."values") as valx(x)
WHERE stat.name='engagement';

Upvotes: 4

Related Questions