Reputation: 1622
Having some issues with multiple left joins not doing what I expect them to!
select
sent.id,
sent.ts,
sent.email,
delivered.ts,
type.label,
min(opens.ts) as first_open,
count(opens.id) as open_count,
min(clicks.ts) as first_click,
count(clicks.id) as click_count
from sent
inner join type on type.id = sent.type_id
left outer join delivered on (delivered.id = sent.id)
left outer join opens on (opens.id = sent.id)
left outer join clicks on (clicks.id = sent.id)
where sent.id = 'a1b1c1d1e1'
group by
sent.id,
sent.ts,
sent.email,
delivered.ts,
type.label,
opens.id,
clicks.id
;
A message is sent, then delivered; that's 1 to 1, but, delivered may not exist.
A message can then be opened (multiple times) and clicked (multiple times), all tied together with the sent.id.
If I just have the opens join, it works fine however, likewise if I just have the clicks join.
When I add the clicks join the first_click
and click_count
show the same values as open.
I get:
1,2020-01-01 00:00:00,[email protected],2020-01-01 00:00:00,test,2020-01-01 01:00:00,4,2020-01-01 01:00:00,4
When it should be:
1,2020-01-01 00:00:00,[email protected],2020-01-01 00:00:00,test,2020-01-01 01:00:00,4,2020-01-01 02:00:00,1
I've tried running with no query cache (ALTER SESSION SET USE_CACHED_RESULT = false;
) and done a basic mirror in MySQL to prove the join and it's fine.
Upvotes: -1
Views: 5251
Reputation: 25978
so trying to bridge the gap between you problem description and the results you mention
start with known data
create or replace table sent (id text, ts timestamp_ntz, email text, type_id number);
create or replace table type (id number, label text);
create or replace table delivered(id text, ts timestamp_ntz);
create or replace table opens(id text, ts timestamp_ntz);
create or replace table clicks(id text, ts timestamp_ntz);
insert into sent values ('a1b1c1d1e1', '2020-01-01 01:00', '[email protected]', 1);
insert into delivered values ('a1b1c1d1e1', '2020-01-01 02:00');
insert into type values (1, 'test');
insert into opens values ('a1b1c1d1e1', '2020-01-01 03:00'),('a1b1c1d1e1', '2020-01-01 04:00'),('a1b1c1d1e1', '2020-01-01 05:00'),('a1b1c1d1e1', '2020-01-01 06:00');
insert into clicks values ('a1b1c1d1e1', '2020-01-01 07:00');
select
sent.id
,sent.ts
,sent.email
,delivered.ts
,type.label
,min(opens.ts) as first_open
,count(opens.id) as open_count
,min(clicks.ts) as first_click
,count(clicks.id) as click_count
from sent
join type on type.id = sent.type_id
left join delivered on (delivered.id = sent.id)
left join opens on (opens.id = sent.id)
left join clicks on (clicks.id = sent.id)
where sent.id = 'a1b1c1d1e1'
group by 1,2,3,4, 5;
I swapped the columns names to their position, because I like it that way, but you don't need to opens.id
or clicks.id
as those are a not selected in a non-aggregates colunms.
ID TS EMAIL TS LABEL FIRST_OPEN OPEN_COUNT FIRST_CLICK CLICK_COUNT
a1b1c1d1e1 2020-01-01 01:00:00.000 [email protected] 2020-01-01 02:00:00.000 test 2020-01-01 03:00:00.000 4 2020-01-01 07:00:00.000 4
I am not sure what join behavior you are changing.. but it might be helpful to print all the rows and see what happening to understand why you are not getting what you expect.
select
sent.id
,sent.ts
,sent.email
,delivered.ts
,type.label
,opens.ts as open_ts
,clicks.ts as click_ts
--,min(opens.ts) as first_open
--,count(opens.id) as open_count
--,min(clicks.ts) as first_click
--,count(clicks.id) as click_count
from sent
join type on type.id = sent.type_id
left join delivered on (delivered.id = sent.id)
left join opens on (opens.id = sent.id)
left join clicks on (clicks.id = sent.id)
where sent.id = 'a1b1c1d1e1'
--group by 1,2,3,4, 5;
gives for me:
ID TS EMAIL TS LABEL OPEN_TS CLICK_TS
a1b1c1d1e1 2020-01-01 01:00:00.000 [email protected] 2020-01-01 02:00:00.000 test 2020-01-01 03:00:00.000 2020-01-01 07:00:00.000
a1b1c1d1e1 2020-01-01 01:00:00.000 [email protected] 2020-01-01 02:00:00.000 test 2020-01-01 04:00:00.000 2020-01-01 07:00:00.000
a1b1c1d1e1 2020-01-01 01:00:00.000 [email protected] 2020-01-01 02:00:00.000 test 2020-01-01 05:00:00.000 2020-01-01 07:00:00.000
a1b1c1d1e1 2020-01-01 01:00:00.000 [email protected] 2020-01-01 02:00:00.000 test 2020-01-01 06:00:00.000 2020-01-01 07:00:00.000
which is what I would expect from ether LEFT or normal INNER joins.. feel free to update with the SQL the give you the broken results, and a listed version of of the output like above, to get a better explanation.
Upvotes: 2