windmark
windmark

Reputation: 78

Multiple Unioned Self-joins in BigQuery

I have a table with id, name and parent_id where parent_id is a parent hierarchy relating to id, see below.

id name parent_id
0 A null
1 B 0
2 C 1
3 D 1
4 E 2

I'm trying to create a nicer looking table with each id and its parent_id, including multiple levels up in the hierarchy. I use UNION and self-join to accomplish this, but I have a feeling there should be a nicer way of querying it with BigQuery's Standard SQL.

In the query below I go two levels, but you can imagine I want to go 5-6 levels.

WITH T1 as (
   select 0 as id, 'A' as name, null as parent_id union all
   select 1 as id, 'B' as name, 0 as parent_id union all
   select 2 as id, 'C' as name, 1 as parent_id union all
   select 3 as id, 'D' as name, 1 as parent_id union all
   select 4 as id, 'E' as name, 2 as parent_id
)

SELECT 
    a.id as id, 
    a.name as req_name,
FROM T1 as a
UNION ALL
SELECT  
    a.id as id,
    b.name as req_name,
FROM T1 as a
JOIN T1 as b ON a.parent_id = b.id
UNION ALL
SELECT 
    a.id as id,
    c.name as req_name,
FROM T1 as a
JOIN T1 as b on a.parent_id = b.id
JOIN T1 as c on b.parent_id = c.id

resulting in the table

id req_name
0 A
1 B
2 C
3 D
4 E
2 A
3 A
4 B
1 A
2 B
3 B
4 C

I would be thankful for any insights!

Upvotes: 0

Views: 284

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1271003

BigQuery does not (yet) support recursive or hierarchical queries. So your approach is actually fine. You can condense it, if you like, using left joins:

with t as (
   select 0 as id, 'A' as name, null as parent_id union all
   select 1 as id, 'B' as name, 0 as parent_id union all
   select 2 as id, 'C' as name, 1 as parent_id union all
   select 3 as id, 'D' as name, 1 as parent_id union all
   select 4 as id, 'E' as name, 2 as parent_id
)
select distinct id, t1.name
from t t1 left join 
     t t2
     on t2.parent_id = t1.id left join 
     t t3
     on t3.parent_id = t2.id cross join
     unnest(array[t1.id, t2.id, t3.id]) id
where id is not null;

You still need explicit joins to the maximum depth of the data.

The other alternative is to use a looping construct, which is available in the scripting language.

Upvotes: 1

Related Questions