jtwp
jtwp

Reputation: 11

Create AWS Athena table from json event

I am trying to create an AWS Athena table from a json event file stored in S3. I seem to be having trouble with format of my json event. The event is delivered in this format:

"[{\"String1\":123,\"String2\":\"abc\",\"String3\":\"def\"}]"

When I create the table it doesn't show any data as I don't think it can read the json string. My table creation code is:

CREATE EXTERNAL TABLE events (
String1 string,
String2 string,
String3 string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://events/';

I'm pretty sure I need other config to have the format of my json event to be parsed correctly, but I'm not sure what. If I set my json event to the below, it all creates and works as intended.

{"String1":"123","String2":"abc","String3":"def"}

Does anyone have any pointers as to what I need to do for my file format to be read/parsed correctly?

Thanks.


UPDATE

I've managed to get my json data delivered without the \, so now just need to handle the start and ending brackets [...],

[{"String1":123,"String2":"abc","String3":"def"}]

as this also causes an issue with my table, as puts all the data into the first row. Without the [...] it is placed correctly. I think I need to use an array, so looking at that.

Upvotes: 0

Views: 2291

Answers (1)

jtwp
jtwp

Reputation: 11

I've resolved this using an array, as most will have expected.

My Athena create table query was therefore:

CREATE EXTERNAL TABLE events (
`details` array<struct<
String1:string,
String2:string,
String3:string >>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://events/';

This gave me a parsed output that I could then run select's on or create a view to show my data in table format:

CREATE OR REPLACE VIEW "v_events" AS
SELECT
  item.string1,
  item.string2,
  item.string3
FROM
  (events
CROSS JOIN UNNEST("details") t (item))

Upvotes: 1

Related Questions