deepAgrawal
deepAgrawal

Reputation: 733

JSON_Extract from list of json string

I want to extract some values for particular keys from a table with json string as below.

raw_data ...
{"label": "XXX", "lines":[{"amount":1000, "category": "A"}, {"amount":100, "category": "B"}, {"amount":10, "category": "C"}]} ...

I am expecting an outcome like

label amount category
XXX [1000, 100, 10] ['A', 'B', 'C']

I am using the following sql query to achieve that

select 
JSON_EXTRACT(raw_data, '$.lines[*].amount') AS amount,
JSON_EXTRACT(raw_data, '$.lines[*].category') AS category,
JSON_EXTRACT(raw_data, '$.label') AS label
from table

I can get a specific element of the list with [0] , [1] etc. But the sql code doesn't work with [*]. I am getting the following error Invalid JSON path: '$.lines[*].amount'

Edit

I am using Presto

Upvotes: 1

Views: 1372

Answers (3)

Guru Stron
Guru Stron

Reputation: 142173

Json path support in Presto is very limited, so you need to do some processing manually for example with casts and array functions:

-- sample data
with dataset (raw_data) as (
    values '{"label": "XXX", "lines":[{"amount":1000, "category": "A"}, {"amount":100, "category": "B"}, {"amount":10, "category": "C"}]}'
)

-- query
select label,
    transform(lines, l -> l['amount']) amount,
    transform(lines, l -> l['category']) category
from (
    select JSON_EXTRACT(raw_data, '$.label') AS label,
        cast(JSON_EXTRACT(raw_data, '$.lines') as array(map(varchar, json))) lines
    from dataset
);

Output:

label amount category
XXX [1000, 100, 10] ["A", "B", "C"]

In Trino json path support was vastly improved, so you can do next:

-- query
select JSON_EXTRACT(raw_data, '$.label') label,
       JSON_QUERY(raw_data, 'lax $.lines[*].amount' WITH ARRAY WRAPPER) amount,
       JSON_QUERY(raw_data, 'lax $.lines[*].category' WITH ARRAY WRAPPER) category
from dataset;

Upvotes: 1

Prabhakar Reddy
Prabhakar Reddy

Reputation: 5124

I was able to get the expected output using unnest to flatten and array_agg to aggregate in Presto. Below is the SQL used and output generated:

WITH dataset AS (
  SELECT
    * from sf_73535794
)
SELECT raw_data.label,array_agg(t.lines.amount) as amount,array_agg(t.lines.category) as category FROM dataset
CROSS JOIN UNNEST(raw_data.lines) as t(lines) group by 1

Output:

enter image description here

Upvotes: 0

Ajax1234
Ajax1234

Reputation: 71451

You can use json_table and json_arrayagg:

select json_extract(t.raw_data, '$.label'),
  (select json_arrayagg(t1.v) from json_table(t.raw_data, '$.lines[*]' columns (v int path '$.amount')) t1),
  (select json_arrayagg(t1.v) from json_table(t.raw_data, '$.lines[*]' columns (v text path '$.category')) t1)
from tbl t

Upvotes: 0

Related Questions