al.moorthi
al.moorthi

Reputation: 138

How do I load CSV file to Amazon Athena that contains JSON field

I have a CSV (tab separated) in s3 that needs to be queried on a JSON field.

uid\tname\taddress
1\tmoorthi\t{"rno":123,"code":400111}
2\tkiranp\t{"rno":124,"street":"kemp road"}

How can I query this data in Amazon Athena?

I should be able to query like:

select uid
from table1
where address['street']="kemp road";

Upvotes: 4

Views: 2264

Answers (1)

John Rotenstein
John Rotenstein

Reputation: 269370

You could try using the json_extract() command.

From Extracting Data from JSON - Amazon Athena:

You may have source data with containing JSON-encoded strings that you do not necessarily want to deserialize into a table in Athena. In this case, you can still run SQL operations on this data, using the JSON functions available in Presto.

WITH dataset AS (
  SELECT '{"name": "Susan Smith",
           "org": "engineering",
           "projects": [{"name":"project1", "completed":false},
           {"name":"project2", "completed":true}]}'
    AS blob
)
SELECT
  json_extract(blob, '$.name') AS name,
  json_extract(blob, '$.projects') AS projects
FROM dataset

This example shows how json_extract() can be used to extract fields from JSON. Thus, you might be able to do something like:

select uid
from table1
where json_extract(address, '$.street') = "kemp road";

Upvotes: 4

Related Questions