Reputation: 2876
I'm facing this issue where I don't really know how to handle duplicate rows when joining two tables.
I have two tables I'd like to join
Value_x table :
ID Campaign Value_x
foo fb 1
foo twitter 3
bar fb 2
foobar ads 2
foo organic 3
Value_y table:
ID Campaign Value_y
foo organic 42
foobar organic 17
foo ads 12
foo twitter 23
bar twitter 35
Because of the Campaign column, ID are not unique in each table and for example, left join table output looks like this:
ID t1.Campaign t2.Campaign Value_x Value_y
foo fb organic 1 42
foo fb ads 1 12
Ideally I'd like to have the following output
ID t1.Campaign t2.Campaign Value_x Value_y
foo fb null 1 null
foo twitter twitter 3 23
foo organic organic 3 42
foo null ads null 12
How would you handle a situation like this? I read something about foreign keys
... I'm not familiar with this concept how can it help in this context? Or is there a way to perform a join at the campaign level?
Upvotes: 0
Views: 942
Reputation: 173046
Below is for BigQuery Standard SQL
#standardSQL
SELECT ID,
x.Campaign AS Campaign_x,
y.Campaign AS Campaign_y,
Value_x,
Value_y
FROM `project.dataset.tableX` x
FULL OUTER JOIN `project.dataset.tableY` y
USING(ID, Campaign)
You can test, play with above using sample data from your questions as in below example
#standardSQL
WITH `project.dataset.tableX` AS (
SELECT 'foo' ID, 'fb' Campaign, 1 Value_x UNION ALL
SELECT 'foo', 'twitter', 3 UNION ALL
SELECT 'bar', 'fb', 2 UNION ALL
SELECT 'foobar', 'ads', 2 UNION ALL
SELECT 'foo', 'organic', 3
), `project.dataset.tableY` AS (
SELECT 'foo' ID, 'organic' Campaign, 42 Value_y UNION ALL
SELECT 'foobar', 'organic', 17 UNION ALL
SELECT 'foo', 'ads', 12 UNION ALL
SELECT 'foo', 'twitter', 23 UNION ALL
SELECT 'bar', 'twitter', 35
)
SELECT ID,
x.Campaign AS Campaign_x,
y.Campaign AS Campaign_y,
Value_x,
Value_y
FROM `project.dataset.tableX` x
FULL OUTER JOIN `project.dataset.tableY` y
USING(ID, Campaign)
-- ORDER BY ID
with result
Row ID Campaign_x Campaign_y Value_x Value_y
1 bar fb null 2 null
2 bar null twitter null 35
3 foo fb null 1 null
4 foo twitter twitter 3 23
5 foo organic organic 3 42
6 foo null ads null 12
7 foobar ads null 2 null
8 foobar null organic null 17
Upvotes: 2
Reputation: 11205
Looks like you want a full outer join
, using campaign as a condition:
select coalesce(t1.id, t2.id) as id,
t1.campaign as t1_campaign,
t2.campaign as t2_campaign,
Value_x,
Value_y
from t1
full outer join t2
on t1.id = t2.id
and t1.campaign = t2.campaign
Upvotes: 2