Choose aggregation based on respective date field

Question

I have metrics at the rep and client level:

select r.rep_month, c.client_month, 
count(distinct r.id) reps, count(distinct c.id) clients
from clients c
left join reps r on c.rep_id=r.id

This of course doesn't work because it gives all combinations of rep_month/client_month- and from a time series stand point- they should be calculated based on two different dates.

What I need is for reps to be calculated based on rep_month and clients to be calculated based on client_month, so there should just be one date in the output.

A generalized example is like so:

rep_date     client_date    reps    clients
3/1/18 0:00  8/1/17 0:00    14      24
3/1/18 0:00  2/1/17 0:00    4       6
3/1/18 0:00  12/1/17 0:00   9       12
3/1/18 0:00  1/1/18 0:00    14      16
3/1/18 0:00  10/1/17 0:00   11      11
3/1/18 0:00  12/1/16 0:00   4       7
3/1/18 0:00  1/1/17 0:00    1       1
3/1/18 0:00  4/1/17 0:00    4       4
3/1/18 0:00  3/1/17 0:00    12      14
3/1/18 0:00  11/1/17 0:00   5       7
3/1/18 0:00  5/1/17 0:00    4       5
3/1/18 0:00  11/1/16 0:00   1       1
3/1/18 0:00  2/1/18 0:00    5       5
3/1/18 0:00  8/1/16 0:00    2       2
3/1/18 0:00  9/1/17 0:00    16      20
3/1/18 0:00  (null)         49      0

This would be the expected output:

date    reps    clients
3/1/18  49      135

But please note that there can be cases where rep_date and client_date are not null, so combining the two into: coalesce(client_date,rep_date) won't work.

Thank you!

Gordon Linoff · Accepted Answer

I think you want something like this:

select mon, sum(reps) as reps, sum(clients) as clients 
from ((select c.client_month as mon, count(*) as clients, 0 as reps
       from clients c
       group by c.client_month
      ) union all
      (select r.rep_month, 0 as clients, count(*) as reps
       from reps r
       group by r.rep_month
      ) 
     ) rc
group by mon
order by mon;

Notes:

You can also do this with a join, but you have to deal with time periods that are missing from either table (i.e., you need full outer join and lots of coalesce()).
I am assuming that the ids are unique in each table, so count(*) and count(distinct id) do the same thing. The former is more efficient, because it does not incur overhead to remove duplicates.

Choose aggregation based on respective date field

Answers (2)

Related Questions