loadbox
loadbox

Reputation: 656

How to join only 1 column based on conditions on multiple columns in SQL?

I have a mapping table with three columns old_id1, old_id2 and new_id in a Google BigQuery table.

After performing some SQL operations on multiple other tables, finally I have a table data where I have a column id (with other columns) which can have either old_id1, old_id2 or new_id. I want to replace id column values such that it only has new_ids using mapping table to perform some aggregate functions.

It is also acceptable to create another column by joining two tables which consists only new_id for each id.

Logically,

If id == new_id, then pass
If id == old_id1 OR id == old_id2, then id = new_id

OR (new column)

For each id, find its corresponding new_id and if it exists put it in a new column.

EXAMPLE: mapping_table

old_id1 old_id2 new_id
o1       o2      n1
oo1      oo2     n2

data

id amount
o1 100
o2 200
n1 300
oo2 100
n2 50
n3 25

If I try to find aggregate sum for each unique id, expected outcome:

id sum
n1 600
n2 150
n3 25

Upvotes: 0

Views: 1529

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Below is for BigQuery Standard SQL

#standardSQL
SELECT new_id AS id, SUM(amount) total_amount
FROM `project.dataset.data` d
JOIN `project.dataset.mapping_table` m
ON id IN (old_id1, old_id2, new_id) 
GROUP BY id   

if to apply to sample data from your question - result is

Row id  total_amount     
1   n1  600  
2   n2  150    

I didn't want to lose the entries from data table whose ids are not in mapping table in any column

#standardSQL
WITH map AS (
  SELECT old_id1 id, new_id FROM `project.dataset.mapping_table` UNION ALL
  SELECT old_id2, new_id FROM `project.dataset.mapping_table` 
)
SELECT COALESCE(new_id, id) AS id, SUM(amount) total_amount
FROM `project.dataset.data` d
LEFT JOIN map m
USING(id)
GROUP BY 1

this will return

Row id  total_amount     
1   n1  600  
2   n2  150  
3   n3  25   

Upvotes: 1

Related Questions