Julie Nguyen
Julie Nguyen

Reputation: 39

Duplicates being generated by left join

I'm currently trying to reshape a table to aggregate email metrics by subscriber level. This is how the table I'm working with looks like:

 SELECT accountid,
       jobid,
       listid,
       batchid,
       subscriberkey,
       eventdate,
       eventtype,
       isunique,
       triggerersenddefinitionobjectid,
       triggeredsendcustomerkey,
       url,
       linkname,
       linkcontent,
       emailid,
       schedtime,
       pickuptime,
       deliveredtime,
       eventid,
       jobtype,
       jobstatus,
       emailname,
       emailsubject,
       sendtype,
       dynamicemailsubject,
       emailsenddefinition
FROM   email_metrics;  

I'm looking to reshape it so that for each unique combination of (subscriberkey + emailid), I have data on whether or not they opened that same email, and if they clicked on that email.

Example of how the current data would look (I condensed the table structure down to 3 columns for the sake of simplicity in explaining my issue, sorry not sure how to insert table here, so it might look confusing):

Record Example 1:

Subscriberkey | EmailID | Eventtype Open
1234          | 2       | Click
1234          | 2       |          

And I'm looking to essentially to reshape it to only one record for each unique (SubscriberKey, EmailName) combination:

SubscriberKey | EmailID2 | Is_Open | Is_Click 
1234          | 2        | True    | True

This will condense all data related to a specific Subscriber+Email send combo, showing me the related metrics on one record.

I was able to successfully do this before, but my laptop died recently and unfortunately my script is not retrievable :(

I've come up with the following so far, but I'm finding there are duplicates in my data generated from Left Joins, I'm having a bit of trouble understanding how I can ensure this doesn't happen to my data:

WITH email_sent AS (
    SELECT *
    FROM email_metrics em 
    WHERE eventtype ='Sent'
),
    email_open AS (
    SELECT *
    FROM email_metrics em2 
    WHERE eventtype ='Open'
    AND isunique = True),

    email_click AS (
    SELECT * 
    FROM email_metrics em3 
    WHERE eventtype='Click'
    AND isunique = True
)

SELECT DISTINCT a.jobid, 
    a.subscriberkey,
    a.send_time,
    a.emailid,
    a.emailname,
    a.emailsubject,
    a.dynamicemailsubject,
    a.emailsenddefinition,
    a.is_opened,
    a.open_date,
    COALESCE (c.eventtype,'Not Clicked') AS is_click,
    c.eventdate AS click_date,
    c.url,
    c.linkname,
    c.linkcontent
FROM
(SELECT DISTINCT s.jobid,
    s.subscriberkey,
    (s.eventdate) AS send_time,
    s.emailid,
    s.emailname,
    s.emailsubject,
    s.dynamicemailsubject,
    s.emailsenddefinition,
    COALESCE (o.eventtype, 'Not Opened') AS is_opened,
    (o.eventdate) AS open_date
FROM email_sent s 
LEFT JOIN email_open o ON (s.jobid=o.jobid AND s.subscriberkey=o.subscriberkey)) a
LEFT JOIN email_click c ON (a.jobid=c.jobid AND a.subscriberkey=c.subscriberkey);

Upvotes: 1

Views: 41

Answers (1)

GMB
GMB

Reputation: 222402

I would recommend just using conditional aggregation for this:

select
    subscriberkey,
    emailid,
    bool_or(eventtype = 'Open') Is_Open,
    bool_or(eventtype = 'Click') Is_Click
from email_metrics
group by subscriberkey, emailid

Upvotes: 1

Related Questions