ulm
ulm

Reputation: 59

Counting events before a specific event

Let's say, I have a table with the following columns:

date | event     | user_id |  unit_id  |cost |  ad_id  | spend
03-15 | impression | 2353   |   3436 | 0.15 |  NULL | NULL
03-15 | impression | 2353   |   3436 | 0.12 |  NULL | NULL
03-15 | impression | 1234   |   5678 | 0.10 |  NULL | NULL
03-15 | click | 1234   |   5678 |  NULL  |  NULL | NULL
03-15 | create_ad |     1234 | 5678 | NULL | 6789 | 10

I want to calculate how many impressions on average it takes before a user creates an id. In this particular scenario, it took one impression for user 1234 to create an ad.

I'm not sure that I can somehow use date to discriminate events (but logically all these events should happen at different moments). However, you can see that impressions have NULLs in ad_id and spend, while create_id does have a number in spend.

This one doesn't work:

select i.user_id
     , i.unit_id
     , count(i.event) impressions_n
     , count(c.event) as ads_n
  from add4ad i
   left 
  join add4ad c
     on i.user_id = c.user_id 
   and i.unit_id = c.unit_id
 where i.event in ('impression')
   and c.spend <> NULL
 group 
    by i.user_id
     , i.unit_id 

I have created a SQLFiddle with this data

Upvotes: 2

Views: 150

Answers (4)

ulm
ulm

Reputation: 59

Seems this is the solution:

select sum(c.impressions_n) / count(1) as average_num_of_impressions from (
select count(i.event) as impressions_n 
  from add4ad i 
  join add4ad c
     on i.user_id = c.user_id and i.unit_id = c.unit_id
 where i.event in ('impression') and c.event in ('create_ad')
 group by i.user_id, i.unit_id              ) c

Upvotes: 0

Serg
Serg

Reputation: 22811

If I got it right, you need to count distinct ads

CREATE TABLE add4ad (`date` date, `event` varchar(10), `user_id` int,
                   `unit_id` int, `cost` float, `ad_id` float, `spend` float);
INSERT INTO add4ad (`date`, `Event`, `user_id`,`unit_id`,`cost`,`ad_id`,`spend`)
VALUES
    ('2018-03-15','impression','2353','3436','0.15',NULL,NULL),
    ('2018-03-15','impression','2353','3436','0.12',NULL,NULL),
    ('2018-03-15','impression','2353','3436','0.10',NULL,NULL),
    ('2018-03-15','impression','1234','5678','0.10',NULL,NULL),
    ('2018-03-15','click','1234','5678', NULL, NULL,NULL),
    ('2018-03-15','create_ad','1234','5678', NULL, 6789,10),
    ('2018-03-16','impression','8765','8871','0.10',NULL,NULL),
    ('2018-03-16','impression','8765','8871','0.10',NULL,NULL),
    ('2018-03-16','impression','8765','8871','0.2',NULL,NULL),
    ('2018-03-16','impression','8765','8871','0.23',NULL,NULL),
    ('2018-03-16','click','8765','8871', NULL, NULL,NULL),
    ('2018-03-16','create_ad','8765','8871', NULL, 6789,10);

select i.user_id, i.unit_id, count(i.event) as impressions_n,
    count(distinct c.event) as ads_n
from add4ad i
join add4ad c
   on i.user_id = c.user_id and i.unit_id = c.unit_id
where i.event in ('impression')
   and c.event in ('create_ad') and c.spend is not NULL
group by i.user_id, i.unit_id 

Returns

user_id unit_id impressions_n   ads_n
1234    5678    1   1
8765    8871    4   1

I've replaced left join with join because where as it is effectively makes your join inner If you still need left join move predicates to ON clause or handle NULLs in where.

fiddle

Upvotes: 2

Seaport
Seaport

Reputation: 173

I went to SQL Fiddle and ran the test via MS SQL engine.

CREATE TABLE add4ad (date date, event varchar(10), user_id int,
                   unit_id int, cost float, ad_id float, spend float);
INSERT INTO add4ad (date, Event, user_id,unit_id,cost,ad_id,spend)
VALUES
    ('2018-03-15','impression','2353','3436','0.15',NULL,NULL),
    ('2018-03-15','impression','2353','3436','0.12',NULL,NULL),
    ('2018-03-15','impression','2353','3436','0.10',NULL,NULL),
    ('2018-03-15','click','1234','5678', NULL, NULL,NULL),
    ('2018-03-15','create_ad','2353','5678', NULL, 6789,10);

My query

with e10 as (select  user_id, event, date, rowid=row_number() over (Partition by user_id order by date)
from add4ad
where event='create_ad'
),
e20 as ( -- get the first create_ad event
select user_id, date
  from e10
  where rowid=1
  )
  select a.user_id, count(1) as N
  from e20 inner join add4ad a
  on e20.user_id=a.user_id
  and a.date<=e20.date
  and a.event='impression'
  group by a.user_id

Upvotes: 2

SteelFeather
SteelFeather

Reputation: 1713

The issue is for checking NULLS you have to use is NULL or is not NULL. Also your data in fiddle is incorrect. It does not have impression for 1234 in fiddle.

select i.user_id, i.unit_id, count(i.event) as impressions_n,
count(c.event) as ads_n
from add4ad i
 left join add4ad c
  on i.user_id = c.user_id and i.unit_id = c.unit_id
where i.event in ('impression')
/*and c.event in ('create_ad')*/ and c.spend is not NULL
group by i.user_id, i.unit_id 

Upvotes: 0

Related Questions