Alejandro de la Torre
Alejandro de la Torre

Reputation: 43

How to querying data from Amazon S3

I am looking to create a Tableau Dashboard with data originated on Amazon DynamoDB. Right now I am sending the data to a bucket on Amazon S3 using Amazon Lambda and I am getting this file on the S3 bucket,

{
  "Items": [
    {
      "payload": {
        "phase": "T",
        "tms_event": "2017-03-16 18:19:50",
        "id_UM": 0,
        "num_severity_level": 0,
        "event_value": 1,
        "int_status": 0
      },
      "deviceId": 6,
      "tms_event": "2017-03-16 18:19:50"
    }
  ]
}

I trying to use Amazon Athena to create a connection with Tableau but the payload attribute is giving me problems and I am not getting any results when I do the SELECT query.

This is the Athena Table,

CREATE EXTERNAL TABLE IF NOT EXISTS default.iot_table_test (
  `payload` map<string,string>,
  `deviceId` int,
  `tms_event` string 
) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'
) LOCATION 's3://iot-logging/'
TBLPROPERTIES ('has_encrypted_data'='false')

Thanks, Alejandro

Upvotes: 1

Views: 1232

Answers (1)

James
James

Reputation: 11931

Your table does not look like it matches your data, because your data has a top-level Items array. Without restructing the JSON data files, I think you would need a table definition like this:

CREATE EXTERNAL TABLE IF NOT EXISTS default.iot_table_test_items (
  `Items` ARRAY<
    STRUCT<
      `payload`: MAP<string, string>,
      `deviceId`: int,
      `tms_event`: string
    >
  >
) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'
) LOCATION 's3://iot-logging/'
TBLPROPERTIES ('has_encrypted_data'='false')

and then query it unnesting the Items array:

SELECT 
  item.deviceId, 
  item.tms_event, 
  item.payload
FROM 
  default.iot_table_test_items
  CROSS JOIN UNNEST (Items) AS i (item)
LIMIT 10;

Upvotes: 3

Related Questions