Reputation: 656
I have a mapping table with three columns old_id1
, old_id2
and new_id
in a Google BigQuery table.
After performing some SQL operations on multiple other tables, finally I have a table data
where I have a column id
(with other columns) which can have either old_id1
, old_id2
or new_id
. I want to replace id
column values such that it only has new_id
s using mapping table to perform some aggregate functions.
It is also acceptable to create another column by joining two tables which consists only new_id
for each id
.
Logically,
If id == new_id, then pass
If id == old_id1 OR id == old_id2, then id = new_id
OR (new column)
For each id, find its corresponding new_id and if it exists put it in a new column.
EXAMPLE:
mapping_table
old_id1 old_id2 new_id
o1 o2 n1
oo1 oo2 n2
data
id amount
o1 100
o2 200
n1 300
oo2 100
n2 50
n3 25
If I try to find aggregate sum for each unique id, expected outcome:
id sum
n1 600
n2 150
n3 25
Upvotes: 0
Views: 1529
Reputation: 172993
Below is for BigQuery Standard SQL
#standardSQL
SELECT new_id AS id, SUM(amount) total_amount
FROM `project.dataset.data` d
JOIN `project.dataset.mapping_table` m
ON id IN (old_id1, old_id2, new_id)
GROUP BY id
if to apply to sample data from your question - result is
Row id total_amount
1 n1 600
2 n2 150
I didn't want to lose the entries from data table whose ids are not in mapping table in any column
#standardSQL
WITH map AS (
SELECT old_id1 id, new_id FROM `project.dataset.mapping_table` UNION ALL
SELECT old_id2, new_id FROM `project.dataset.mapping_table`
)
SELECT COALESCE(new_id, id) AS id, SUM(amount) total_amount
FROM `project.dataset.data` d
LEFT JOIN map m
USING(id)
GROUP BY 1
this will return
Row id total_amount
1 n1 600
2 n2 150
3 n3 25
Upvotes: 1