user8834780
user8834780

Reputation: 1670

Choose aggregation based on respective date field

I have metrics at the rep and client level:

select r.rep_month, c.client_month, 
count(distinct r.id) reps, count(distinct c.id) clients
from clients c
left join reps r on c.rep_id=r.id

This of course doesn't work because it gives all combinations of rep_month/client_month- and from a time series stand point- they should be calculated based on two different dates.

What I need is for reps to be calculated based on rep_month and clients to be calculated based on client_month, so there should just be one date in the output.

A generalized example is like so:

rep_date     client_date    reps    clients
3/1/18 0:00  8/1/17 0:00    14      24
3/1/18 0:00  2/1/17 0:00    4       6
3/1/18 0:00  12/1/17 0:00   9       12
3/1/18 0:00  1/1/18 0:00    14      16
3/1/18 0:00  10/1/17 0:00   11      11
3/1/18 0:00  12/1/16 0:00   4       7
3/1/18 0:00  1/1/17 0:00    1       1
3/1/18 0:00  4/1/17 0:00    4       4
3/1/18 0:00  3/1/17 0:00    12      14
3/1/18 0:00  11/1/17 0:00   5       7
3/1/18 0:00  5/1/17 0:00    4       5
3/1/18 0:00  11/1/16 0:00   1       1
3/1/18 0:00  2/1/18 0:00    5       5
3/1/18 0:00  8/1/16 0:00    2       2
3/1/18 0:00  9/1/17 0:00    16      20
3/1/18 0:00  (null)         49      0

This would be the expected output:

date    reps    clients
3/1/18  49      135

But please note that there can be cases where rep_date and client_date are not null, so combining the two into: coalesce(client_date,rep_date) won't work.

Thank you!

Upvotes: 0

Views: 36

Answers (2)

Laurie996
Laurie996

Reputation: 123

You can try:

   Select r.rep_month, c.client, r.reps from
     (select rep_month, count(distinct id) reps 
         from reps
         group by rep_month) r 
   left join  
     (select client_month, count(distinct id) clients 
         from clients 
         group by client_month) c
   on r.rep_month = c.client_month

It doesn't make sense that you can join on id, just join on month. Or, if there is a main table with the id in it, start with that table first. Plus, the clients table has to have all the months in it. If there are always reps every month, put that table first then left join.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269663

I think you want something like this:

select mon, sum(reps) as reps, sum(clients) as clients 
from ((select c.client_month as mon, count(*) as clients, 0 as reps
       from clients c
       group by c.client_month
      ) union all
      (select r.rep_month, 0 as clients, count(*) as reps
       from reps r
       group by r.rep_month
      ) 
     ) rc
group by mon
order by mon;

Notes:

  • You can also do this with a join, but you have to deal with time periods that are missing from either table (i.e., you need full outer join and lots of coalesce()).
  • I am assuming that the ids are unique in each table, so count(*) and count(distinct id) do the same thing. The former is more efficient, because it does not incur overhead to remove duplicates.

Upvotes: 1

Related Questions