Pino Gatto
Pino Gatto

Reputation: 21

Values increasing when JOIN Statement is added

I have been having some difficulties with this issue for a few days and I cannot figure it out.

(Running this is Bigquery)

For a marketing client, I am trying to obtain impressions and clicks per advertiser.

p_impressions has all the impressions data, including IDs Campaign_ID, Advertiser_ID p_click has all the click data, including IDs Campaign_ID, Advertiser_ID match_table_campaigns has campaign data: Campaign_ID, Campaign Name

(essentially my simple goal is to get the count of impressions and clicks and match it to the Campaign table so that I can see the name of the campaign and not just the ID.

When I run a query using UNION all for both the Impression and Click tables, using only the IDS, all the numbers match my platform and everything looks great.

However, when I insert a JOIN (I've tried all of them) to include the name of the campaign, all the metrics of Impressions and click seems to increase and I cannot seem to get the correct numbers anymore.

with combinedDeliveryData AS (
SELECT
  FORMAT_DATETIME("%F",DATETIME(TIMESTAMP_MILLIS( CAST(SUBSTR(CAST( Event_Time AS String),0,LENGTH(CAST( Event_Time AS String)) - 3) AS INT64 )),"America/Toronto")) AS Date,
  Impression_ID,
  DBM_Advertiser_ID,
  Campaign_ID,
  Ad_ID,
  Rendering_ID,
  Event_Type,
  Advertiser_ID,
FROM
  CampaignManager.p_impression
UNION ALL
SELECT
  FORMAT_DATETIME("%F",DATETIME(TIMESTAMP_MILLIS( CAST(SUBSTR(CAST( Event_Time AS String),0,LENGTH(CAST( Event_Time AS String)) - 3) AS INT64 )),"America/Toronto")) AS Date,
  Impression_ID,
  DBM_Advertiser_ID,
  Campaign_ID,
  Ad_ID,
  Rendering_ID,
  Event_Type,
  Advertiser_ID,
FROM
  CampaignManager.p_click)
  
SELECT cpg.Campaign_ID, cpg.Campaign,data.Date,COUNT(case data.Event_Type when 'VIEW' then 1 else null end) AS Impressions
FROM match_table_campaigns AS cpg
LEFT JOIN combinedDeliveryData AS data ON cpg.Campaign_ID = data.Campaign_ID
WHERE cpg.Advertiser_ID = "4739279"
GROUP BY 3,1,2

Any help would be greatly appreciated. I am not sure what I am missing.

AS a note, here is what works without the JOIN

SELECT
  Impression_ID,
  Event_Time,
  DBM_Advertiser_ID,
  Campaign_ID,
  Ad_ID,
  Rendering_ID,
  Event_Type,
  Advertiser_ID
FROM
  CampaignManager.p_impression
UNION ALL
SELECT
  Impression_ID,
  Event_Time,
  DBM_Advertiser_ID,
  Campaign_ID,
  Ad_ID,
  Rendering_ID,
  Event_Type,
  Advertiser_ID
FROM
  CampaignManager.p_click)
SELECT FORMAT_DATETIME("%F",DATETIME(TIMESTAMP_MILLIS( CAST(SUBSTR(CAST( clicks.Event_Time AS String),0,LENGTH(CAST( clicks.Event_Time AS String)) - 3) AS INT64 )),"America/Toronto")) AS exposureDate,clicks.Campaign_ID,COUNT(case Event_Type when 'VIEW' then 1 else null end) AS Impressions,COUNT(case Event_Type when 'CLICK' then 1 else null end) AS Clicks
FROM 
combinedDeliveryData AS clicks
WHERE clicks.Campaign_ID = "4739279"
GROUP BY 1,2

NEW ISSUE: Ive added the following code and things looked great

  this_is_deduplicated AS (
    SELECT x.*
    FROM (
      SELECT Campaign_ID, ARRAY_AGG(a LIMIT 1)[OFFSET(0)] x
      FROM combinedDeliveryData a
      GROUP BY 1
    ) 
  )

LEFT JOIN this_is_deduplicated AS data ON cpg.Campaign_ID = data.Campaign_ID

Getting all my values, and they are acurrate. I am now running into a nother issue where I have to join a final table, but the solution above is not working for it.

Values in new table: Date Campaign_ID Activity_ID

I need to now join this last table so that for each Date, I get the COUNT of Activity_ID As Conversiions.

My Final table should be:

Advertiser, Campaign, Impressions, Clicks, Conversions (As above)

Upvotes: 0

Views: 318

Answers (2)

Felipe Hoffa
Felipe Hoffa

Reputation: 59225

I'm almost sure the cause is having multiple identical campaign_id in the combinedDeliveryData table.

Make sure there's only one id per campaign before doing the join. With something like:

WITH [...], 
  this_is_deduplicated AS (
    SELECT x.*
    FROM (
      SELECT Campaign_ID, ARRAY_AGG(a LIMIT 1)[OFFSET(0)] x
      FROM combinedDeliveryData a
      GROUP BY 1
    ) 
  )

And JOIN against that as in:

LEFT JOIN this_is_deduplicated AS data ON cpg.Campaign_ID = data.Campaign_ID

instead of

LEFT JOIN combinedDeliveryData AS data ON cpg.Campaign_ID = data.Campaign_ID

or a shorter

LEFT JOIN this_is_deduplicated AS data USING(Campaign_ID)

Upvotes: 1

zealous
zealous

Reputation: 7503

You may need to join on campaign and campaign_id. Provided you sample query below

select
    t1.date,
    t1.campaign_id,
    t1.campaign,
    sum(case when t1.event = 'view' then 1 else 0 end) as impressions,
    sum(case when t2.event = 'click' then 1 else 0 end) as clicks
from table1 as t1
left join table2 as t2
on t1.campaign = t2.campaign
and on t1.campaign_id = t2.campaign_id

where advertiser_id = '12345'
group by
    t1.campaign_id,
    t1.campaign,
    t1.date

Upvotes: 0

Related Questions