Reputation: 1055
So I have 2 Json arrays that need unnesting, and joining based on a key within the json structure. In theory is easy, but without having a 'left join unnest' functionality, it all becomes messy.
I have achieved what I want, by grouping the results; but I also have concerns that it is doing 2 cross joins, effectively generating many thousands of superfluous rows (in a live environment) before filtering them back out again.
Hence, my question here, is really looking for a much more efficient strategy to do the same logic. I'm well aware that my Presto experience & knowledge is n its infancy !
Thanks for any guidance !
Workings:
Basic logic : Each item in the 'left' array has an $.id value. For some of the 'left' items, there will be a matching right item with $.a.id value
Examples :
(1) Raw results of Cross Join
with cte as (
Select
123 as record_id,
'[ {"id":"01","key1":["val1"]}, {"id":"02","key1":["val2"]}, {"id":"03","key1":["val3"]} ]' as "left",
'[ {"a":{"id":"02","key1":["apples"]}, "b":{"lala":"bananas"}},{"a":{"id":"01","key1":["one"]}, "b":{"lala":"oneone"}} ]' as "right"
)
select
record_id,
l.i as "left",
r.i as "right",
json_extract(l.i, '$.id') as left_id,
json_extract(r.i, '$.a.id') as right_id
from
cte,
unnest(cast (json_parse("left") as array(json))) as l(i), -- left array
unnest(cast (json_parse("right") as array(json))) as r(i) -- right array
Output:
record_id | left | right | left_id | right_id |
---|---|---|---|---|
123 | {"id":"01","key1":["val1"]} | {"a":{"id":"02","key1":["apples"]},"b":{"lala":"bananas"}} | "01" | "02" |
123 | {"id":"01","key1":["val1"]} | {"a":{"id":"01","key1":["one"]},"b":{"lala":"oneone"}} | "01" | "01" |
123 | {"id":"02","key1":["val2"]} | {"a":{"id":"02","key1":["apples"]},"b":{"lala":"bananas"}} | "02" | "02" |
123 | {"id":"02","key1":["val2"]} | {"a":{"id":"01","key1":["one"]},"b":{"lala":"oneone"}} | "02" | "01" |
123 | {"id":"03","key1":["val3"]} | {"a":{"id":"02","key1":["apples"]},"b":{"lala":"bananas"}} | "03" | "02" |
123 | {"id":"03","key1":["val3"]} | {"a":{"id":"01","key1":["one"]},"b":{"lala":"oneone"}} | "03" | "01" |
(2) Current Solution
select
record_id,
l.i as "left",
max( if(json_extract(l.i, '$.id') = json_extract(r.i, '$.a.id'),json_format(r.i),null) )as match
from
cte,
unnest(cast (json_parse("left") as array(json))) as l(i), -- left array
unnest(cast (json_parse("right") as array(json))) as r(i) -- right array
group by
record_id,
l.i
record_id | left | match |
---|---|---|
123 | {"id":"01","key1":["val1"]} | {"a":{"id":"01","key1":["one"]},"b":{"lala":"oneone"}} |
123 | {"id":"02","key1":["val2"]} | {"a":{"id":"02","key1":["apples"]},"b":{"lala":"bananas"}} |
123 | {"id":"03","key1":["val3"]} |
Upvotes: 1
Views: 1664
Reputation: 38335
Unnest both arrays in CTEs and left join CTEs, in this case you will eliminate cross join, but the code is a bit longer:
with cte as (
Select
123 as record_id,
'[ {"id":"01","key1":["val1"]}, {"id":"02","key1":["val2"]}, {"id":"03","key1":["val3"]} ]' as "left",
'[ {"a":{"id":"02","key1":["apples"]}, "b":{"lala":"bananas"}},{"a":{"id":"01","key1":["one"]}, "b":{"lala":"oneone"}} ]' as "right"
),
"left" as (
select
record_id,
l.i as "left",
json_extract(l.i, '$.id') as left_id
from
cte,
unnest(cast (json_parse("left") as array(json))) as l(i) -- left array
),
"right" as (
select
record_id,
r.i as "right",
json_extract(r.i, '$.a.id') as right_id
from
cte,
unnest(cast (json_parse("right") as array(json))) as r(i) -- right array
)
select
l.record_id,
l."left",
r."right",
l.left_id,
r.right_id
from
"left" l left join "right" r on l.record_id=r.record_id and l.left_id=r.right_id
Result:
record_id | left | right | left_id | right_id |
---|---|---|---|---|
123 | {"id":"01","key1":["val1"]} | {"a":{"id":"01","key1":["one"]},"b":{"lala":"oneone"}} | "01" | "01" |
123 | {"id":"02","key1":["val2"]} | {"a":{"id":"02","key1":["apples"]},"b":{"lala":"bananas"}} | "02" | "02" |
123 | {"id":"03","key1":["val3"]} | \N | "03" | \N |
Upvotes: 1