Reputation: 55
I have a table with tracking data. Among other values the table has the columns traffic_medium, traffic_source and traffic_campaign. The columns do contain sometimes (none) or null as value.
I would like to match the sum of visitors from an other table using a left join with medium, scource and campaign as matching criteria.
This works fine if all columns contain data. It does not work, if one column has (none) or null as value.
I use BigQuery and legacy SQL.
SELECT
A.id,
A.trafficSource_medium,
A.trafficSource_source,
A.trafficSource_campaign,
B.sum_visitor AS sum_visitor
FROM [table] AS A
left outer join (Select
count(distinct fullvisitorID) as sum_visitor,
trafficSource_medium,
trafficSource_source,
trafficSource_campaign
FROM [table2]
GROUP BY trafficSource_medium,
trafficSource_source,
trafficSource_campaign)
AS B
on A.trafficSource_medium=B.trafficSource_medium AND
A.trafficSource_source=B.trafficSource_source AND
A.trafficSource_campaign=B.trafficSource_campaign
Thanks for your help!
Upvotes: 3
Views: 2249
Reputation: 172993
Try something like below
Assuming respective fields are of STRING type. If they are INT - replace 'n/a' with let's say -999 - important to choose constant that is not used as a value for respective field
#legacySQL
SELECT
A.id,
CASE WHEN A.trafficSource_medium = 'n/a' THEN NULL ELSE A.trafficSource_medium END AS trafficSource_medium,
CASE WHEN A.trafficSource_source = 'n/a' THEN NULL ELSE A.trafficSource_source END AS trafficSource_source,
CASE WHEN A.trafficSource_campaign = 'n/a' THEN NULL ELSE A.trafficSource_campaign END AS trafficSource_campaign,
B.sum_visitor AS sum_visitor
FROM (
SELECT
id,
IFNULL(trafficSource_medium, 'n/a') AS trafficSource_medium,
IFNULL(trafficSource_source, 'n/a') AS trafficSource_source,
IFNULL(trafficSource_campaign 'n/a') AS trafficSource_campaign
FROM [table]
) AS A
LEFT OUTER JOIN (
SELECT
COUNT(DISTINCT fullvisitorID) AS sum_visitor,
IFNULL(trafficSource_medium, 'n/a') AS trafficSource_medium,
IFNULL(trafficSource_source, 'n/a') AS trafficSource_source,
IFNULL(trafficSource_campaign 'n/a') AS trafficSource_campaign
FROM [table2]
GROUP BY
trafficSource_medium,
trafficSource_source,
trafficSource_campaign
) AS B
ON A.trafficSource_medium = B.trafficSource_medium
AND A.trafficSource_source = B.trafficSource_source
AND A.trafficSource_campaign = B.trafficSource_campaign
Idea here is to "transform" NULLs to some value so they are JOIN'able - and then "transform" it back to NULL in final SELECT
If you can migrate to Standard SQL - you can try below instead - it is less changes to do - just mostly in ON clause
#standardSQL
SELECT
A.id,
A.trafficSource_medium,
A.trafficSource_source,
A.trafficSource_campaign,
B.sum_visitor AS sum_visitor
FROM `table` AS A
LEFT OUTER JOIN (
SELECT
COUNT(DISTINCT fullvisitorID) AS sum_visitor,
trafficSource_medium,
trafficSource_source,
trafficSource_campaign
FROM `table2`
GROUP BY
trafficSource_medium,
trafficSource_source,
trafficSource_campaign
) AS B
ON IFNULL(A.trafficSource_medium, 'n/a') = IFNULL(B.trafficSource_medium, 'n/a')
AND IFNULL(A.trafficSource_source, 'n/a') = IFNULL(B.trafficSource_source, 'n/a')
AND IFNULL(A.trafficSource_campaign, 'n/a') = IFNULL(B.trafficSource_campaign, 'n/a')
Upvotes: 2