Albertok
Albertok

Reputation: 55

Left Join with multiple criteria which are partly empty or null

I have a table with tracking data. Among other values the table has the columns traffic_medium, traffic_source and traffic_campaign. The columns do contain sometimes (none) or null as value.

I would like to match the sum of visitors from an other table using a left join with medium, scource and campaign as matching criteria.

This works fine if all columns contain data. It does not work, if one column has (none) or null as value.

I use BigQuery and legacy SQL.

SELECT  
A.id,
A.trafficSource_medium,
A.trafficSource_source,
A.trafficSource_campaign,
B.sum_visitor AS sum_visitor

FROM [table] AS A
left outer join (Select 
count(distinct fullvisitorID) as sum_visitor,
trafficSource_medium,
trafficSource_source,
trafficSource_campaign
FROM [table2] 
GROUP BY trafficSource_medium,
trafficSource_source,
trafficSource_campaign)
AS B
on A.trafficSource_medium=B.trafficSource_medium AND     
A.trafficSource_source=B.trafficSource_source AND 
A.trafficSource_campaign=B.trafficSource_campaign

Thanks for your help!

Upvotes: 3

Views: 2249

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Try something like below
Assuming respective fields are of STRING type. If they are INT - replace 'n/a' with let's say -999 - important to choose constant that is not used as a value for respective field

#legacySQL
SELECT  
  A.id,
  CASE WHEN A.trafficSource_medium = 'n/a' THEN NULL ELSE A.trafficSource_medium END AS trafficSource_medium,
  CASE WHEN A.trafficSource_source = 'n/a' THEN NULL ELSE A.trafficSource_source END AS trafficSource_source,
  CASE WHEN A.trafficSource_campaign = 'n/a' THEN NULL ELSE A.trafficSource_campaign END AS trafficSource_campaign,
  B.sum_visitor AS sum_visitor
FROM (
  SELECT 
    id,
    IFNULL(trafficSource_medium, 'n/a') AS trafficSource_medium,
    IFNULL(trafficSource_source, 'n/a') AS trafficSource_source,
    IFNULL(trafficSource_campaign 'n/a') AS trafficSource_campaign
  FROM [table] 
) AS A
LEFT OUTER JOIN (
  SELECT 
    COUNT(DISTINCT fullvisitorID) AS sum_visitor,
    IFNULL(trafficSource_medium, 'n/a') AS trafficSource_medium,
    IFNULL(trafficSource_source, 'n/a') AS trafficSource_source,
    IFNULL(trafficSource_campaign 'n/a') AS trafficSource_campaign
  FROM [table2] 
  GROUP BY 
    trafficSource_medium,
    trafficSource_source,
    trafficSource_campaign
) AS B
ON A.trafficSource_medium = B.trafficSource_medium 
AND A.trafficSource_source = B.trafficSource_source 
AND A.trafficSource_campaign = B.trafficSource_campaign  

Idea here is to "transform" NULLs to some value so they are JOIN'able - and then "transform" it back to NULL in final SELECT

If you can migrate to Standard SQL - you can try below instead - it is less changes to do - just mostly in ON clause

#standardSQL
SELECT  
  A.id,
  A.trafficSource_medium,
  A.trafficSource_source,
  A.trafficSource_campaign,
  B.sum_visitor AS sum_visitor
FROM `table` AS A
LEFT OUTER JOIN (
  SELECT 
    COUNT(DISTINCT fullvisitorID) AS sum_visitor,
    trafficSource_medium,
    trafficSource_source,
    trafficSource_campaign
  FROM `table2`
  GROUP BY 
    trafficSource_medium,
    trafficSource_source,
    trafficSource_campaign
) AS B
ON IFNULL(A.trafficSource_medium, 'n/a') = IFNULL(B.trafficSource_medium, 'n/a') 
AND IFNULL(A.trafficSource_source, 'n/a') = IFNULL(B.trafficSource_source, 'n/a') 
AND IFNULL(A.trafficSource_campaign, 'n/a') = IFNULL(B.trafficSource_campaign, 'n/a') 

Upvotes: 2

Related Questions