Reputation: 59
Let's say, I have a table with the following columns:
date | event | user_id | unit_id |cost | ad_id | spend
03-15 | impression | 2353 | 3436 | 0.15 | NULL | NULL
03-15 | impression | 2353 | 3436 | 0.12 | NULL | NULL
03-15 | impression | 1234 | 5678 | 0.10 | NULL | NULL
03-15 | click | 1234 | 5678 | NULL | NULL | NULL
03-15 | create_ad | 1234 | 5678 | NULL | 6789 | 10
I want to calculate how many impressions on average it takes before a user creates an id. In this particular scenario, it took one impression for user 1234 to create an ad.
I'm not sure that I can somehow use date to discriminate events (but logically all these events should happen at different moments). However, you can see that impressions have NULLs in ad_id and spend, while create_id does have a number in spend.
This one doesn't work:
select i.user_id
, i.unit_id
, count(i.event) impressions_n
, count(c.event) as ads_n
from add4ad i
left
join add4ad c
on i.user_id = c.user_id
and i.unit_id = c.unit_id
where i.event in ('impression')
and c.spend <> NULL
group
by i.user_id
, i.unit_id
I have created a SQLFiddle with this data
Upvotes: 2
Views: 150
Reputation: 59
Seems this is the solution:
select sum(c.impressions_n) / count(1) as average_num_of_impressions from (
select count(i.event) as impressions_n
from add4ad i
join add4ad c
on i.user_id = c.user_id and i.unit_id = c.unit_id
where i.event in ('impression') and c.event in ('create_ad')
group by i.user_id, i.unit_id ) c
Upvotes: 0
Reputation: 22811
If I got it right, you need to count distinct ads
CREATE TABLE add4ad (`date` date, `event` varchar(10), `user_id` int,
`unit_id` int, `cost` float, `ad_id` float, `spend` float);
INSERT INTO add4ad (`date`, `Event`, `user_id`,`unit_id`,`cost`,`ad_id`,`spend`)
VALUES
('2018-03-15','impression','2353','3436','0.15',NULL,NULL),
('2018-03-15','impression','2353','3436','0.12',NULL,NULL),
('2018-03-15','impression','2353','3436','0.10',NULL,NULL),
('2018-03-15','impression','1234','5678','0.10',NULL,NULL),
('2018-03-15','click','1234','5678', NULL, NULL,NULL),
('2018-03-15','create_ad','1234','5678', NULL, 6789,10),
('2018-03-16','impression','8765','8871','0.10',NULL,NULL),
('2018-03-16','impression','8765','8871','0.10',NULL,NULL),
('2018-03-16','impression','8765','8871','0.2',NULL,NULL),
('2018-03-16','impression','8765','8871','0.23',NULL,NULL),
('2018-03-16','click','8765','8871', NULL, NULL,NULL),
('2018-03-16','create_ad','8765','8871', NULL, 6789,10);
select i.user_id, i.unit_id, count(i.event) as impressions_n,
count(distinct c.event) as ads_n
from add4ad i
join add4ad c
on i.user_id = c.user_id and i.unit_id = c.unit_id
where i.event in ('impression')
and c.event in ('create_ad') and c.spend is not NULL
group by i.user_id, i.unit_id
Returns
user_id unit_id impressions_n ads_n
1234 5678 1 1
8765 8871 4 1
I've replaced left join
with join
because where
as it is effectively makes your join inner
If you still need left join move predicates to ON
clause or handle NULLs in where.
Upvotes: 2
Reputation: 173
I went to SQL Fiddle and ran the test via MS SQL engine.
CREATE TABLE add4ad (date date, event varchar(10), user_id int,
unit_id int, cost float, ad_id float, spend float);
INSERT INTO add4ad (date, Event, user_id,unit_id,cost,ad_id,spend)
VALUES
('2018-03-15','impression','2353','3436','0.15',NULL,NULL),
('2018-03-15','impression','2353','3436','0.12',NULL,NULL),
('2018-03-15','impression','2353','3436','0.10',NULL,NULL),
('2018-03-15','click','1234','5678', NULL, NULL,NULL),
('2018-03-15','create_ad','2353','5678', NULL, 6789,10);
My query
with e10 as (select user_id, event, date, rowid=row_number() over (Partition by user_id order by date)
from add4ad
where event='create_ad'
),
e20 as ( -- get the first create_ad event
select user_id, date
from e10
where rowid=1
)
select a.user_id, count(1) as N
from e20 inner join add4ad a
on e20.user_id=a.user_id
and a.date<=e20.date
and a.event='impression'
group by a.user_id
Upvotes: 2
Reputation: 1713
The issue is for checking NULLS you have to use is NULL or is not NULL. Also your data in fiddle is incorrect. It does not have impression for 1234 in fiddle.
select i.user_id, i.unit_id, count(i.event) as impressions_n,
count(c.event) as ads_n
from add4ad i
left join add4ad c
on i.user_id = c.user_id and i.unit_id = c.unit_id
where i.event in ('impression')
/*and c.event in ('create_ad')*/ and c.spend is not NULL
group by i.user_id, i.unit_id
Upvotes: 0