IndianGrandma007
IndianGrandma007

Reputation: 165

Query JSON file in Presto in S3

I have a file in S3, and Presto running on EMR. I see I can use Json_extract to read the json.

I am running the following query, however, I keep seeing null instead of the correct value.

select json_extract('s3a://random-s3-bucket/analytics/20210221/myjsonfile.json', '$.dateAvailability')

I see this output

enter image description here

Not sure if my syntax is wrong? Thoughts?

Upvotes: 0

Views: 2822

Answers (2)

brngyn
brngyn

Reputation: 151

If you need a tool to help you create the table statement, try this one: https://www.hivetablegenerator.com

From the page:

Easily convert any JSON (even complex Nested ones), CSV, TSV, or Log sample file to an Apache HiveQL DDL create table statement.

Upvotes: 1

Piotr Findeisen
Piotr Findeisen

Reputation: 20770

json_extract() operates on JSON scalar values kept in memory. It does not load data from an external location. See documentation page for usage examples.

In order to query a JSON file using Trino (formerly known as Presto SQL), you need to map it as a table with JSON format like this:

CREATE TABLE my_table ( .... )
WITH (
    format = 'JSON',
    external_location = 's3a://random-s3-bucket/analytics/20210221'
);

See more information in Hive connector documentation.

Upvotes: 1

Related Questions