Alan Wayne
Alan Wayne

Reputation: 5384

Count rows for each unique combination of columns in SQL

I would like to return a set of unique records from a table based on two columns along with the most recent posting time and a total count of the number of times the combination of those two columns has appeared before (in time) the record of their output.

So what I'm trying to get is something along these lines:

select col1, col2, max_posted, count from T
join (
 select col1, col2, max(posted) as posted  from T where groupid = "XXX" 
group by col1, col2) h
on ( T.col1 = h.col1 and
  T.col2 = h.col2 and
  T.max_posted = h.tposted)
where T.groupid = 'XXX'

Count needs to be the number of times EACH combination of col1 and col2 occurred BEFORE the max_posted of each record in the output. (I hope I explained that correctly :)

Edit: In trying the below suggestion as:

 select dx.*,
   count(*) over (partition by dx.cicd9, dx.cdesc order by dx.tposted) as   cnt
from dx
join (
select cicd9, cdesc, max(tposted) as tposted  from dx where groupid ="XXX" 
group by cicd9, cdesc) h
on ( dx.cicd9 = h.cicd9 and
  dx.cdesc = h.cdesc and
  dx.tposted = h.tposted)
where groupid =  'XXX';

The count always returns '1'. Additionally, how would you count only the records that occurred before tposted?

This also fails, but I hope you can get where I'm headed:

  WITH H AS (
    SELECT cicd9, cdesc, max(tposted) as tposted  from dx where groupid =  'XXX' 
    group by cicd9, cdesc), 
    J AS (
    SELECT  count(*) as cnt
    FROM dx, h
    WHERE dx.cicd9 = h.cicd9
      and dx.cdesc = h.cdesc
      and dx.tposted <= h.tposted
      and dx.groupid = 'XXX'
 )
SELECT H.*,J.cnt
FROM H,J 

Help anyone?

Upvotes: 2

Views: 2278

Answers (4)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656596

This was confusing:

Count needs to be the number of times EACH combination of col1 and col2 occurred BEFORE the max_posted of each record in the output.

Since, by definition, every record is "before" (or at the same time as) the latest post, this essentially means the total count per combination (ignoring the assumed off-by-one error in the sentence).

So this burns down to a simple GROUP BY:

SELECT cicd9, cdesc
     , max(posted) AS last_posted
     , count(*)    AS ct
FROM   dx
WHERE  groupid = 'XXX'
GROUP  BY 1, 2
ORDER  BY 1, 2;

Which does exactly the same as the currently accepted answer. Just a lot faster and simpler.

Upvotes: 0

Patrick
Patrick

Reputation: 32199

How about this:

SELECT DISTINCT ON (cicd9, cdesc) cicd9, cdesc,
  max(posted) OVER w AS last_post,
  count(*) OVER w AS num_posts
FROM dx
WHERE groupid = 'XXX'
WINDOW w AS (
  PARTITION BY cicd9, cdesc
  RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
);

Given the lack of PG version, table definition, data and desired output this is just shooting from the hip, but the principle should work: Make a partition on the two columns where groupid = 'XXX', then find the maximum value of the posted column and the total number of rows in the window frame (hence the RANGE... clause in the window definition).

Upvotes: 1

Alan Wayne
Alan Wayne

Reputation: 5384

This was the best I could come up with -- better suggestions are welcome!

This will produce the results I need, with the understanding that count will always be at least 1 (from the join):

  SELECT dx.cicd9, dx.cdesc, max(dx.tposted), count(*)
from dx 
join (
SELECT cicd9, cdesc, max(tposted) as tposted  from dx where groupid   =  'XXX' 
    group by cicd9, cdesc) h
on 
  (dx.cicd9 = h.cicd9 and dx.cdesc = h.cdesc and dx.tposted <= h.tposted 
  and dx.groupid = 'XXX')
group by dx.cicd9, dx.cdesc
order by dx.cdesc;

or

 WITH H AS (
    SELECT cicd9, cdesc, max(tposted) as tposted  from dx where groupid =  'XXX' 
    group by cicd9, cdesc)  
SELECT dx.cicd9, dx.cdesc, max(dx.tposted), count(*)
from dx, H
where dx.cicd9 = h.cicd9 and dx.cdesc = h.cdesc and dx.tposted <= h.tposted 
  and dx.groupid = 'XXX'
group by dx.cicd9, dx.cdesc
order by cdesc;

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269693

Do you just want a cumulative count?

select t.*,
       count(*) over (partition by col1, col2 order by posted) as cnt
from table t
where groupid = 'xxx';

Upvotes: 0

Related Questions