Reputation: 138
I have a CSV (tab separated) in s3 that needs to be queried on a JSON field.
uid\tname\taddress
1\tmoorthi\t{"rno":123,"code":400111}
2\tkiranp\t{"rno":124,"street":"kemp road"}
How can I query this data in Amazon Athena?
I should be able to query like:
select uid
from table1
where address['street']="kemp road";
Upvotes: 4
Views: 2264
Reputation: 269370
You could try using the json_extract()
command.
From Extracting Data from JSON - Amazon Athena:
You may have source data with containing JSON-encoded strings that you do not necessarily want to deserialize into a table in Athena. In this case, you can still run SQL operations on this data, using the JSON functions available in Presto.
WITH dataset AS (
SELECT '{"name": "Susan Smith",
"org": "engineering",
"projects": [{"name":"project1", "completed":false},
{"name":"project2", "completed":true}]}'
AS blob
)
SELECT
json_extract(blob, '$.name') AS name,
json_extract(blob, '$.projects') AS projects
FROM dataset
This example shows how json_extract()
can be used to extract fields from JSON. Thus, you might be able to do something like:
select uid
from table1
where json_extract(address, '$.street') = "kemp road";
Upvotes: 4