Josh Darnell
Josh Darnell

Reputation: 11433

At most N rows from each group

I have a SQL table containing customer information (we'll call it CustomerDB), including an address column. Many of the rows in this table have exact duplicate addresses. The business requirement is to retrieve rows from the database such that:

So, if I have a data set like this (I have also entered this in the Data Explorer, to make it easy to test queries against):

ID     NAME       COMPANY    ADDR1
16242  TOM E      Company A  101 First RD  
16241  RONALD J   Company B  12 Tenth AVE  
16235  KENNETH H             12 Tenth AVE  
16238  MICHAEL H  Company C  12 Tenth AVE  
16243  ANTONIO D  Company D  264 Long ST STE 5  
16237  MICHAEL B  Company E  264 Long ST STE 5  
16234  WALTER L   Company F  73 North RD  
16236  CARL O     Company G  73 North RD  
16239  MICHAEL S  Company H  73 North RD  
16240  MICHAEL I  Company I  73 North RD 

I would like to get all of those rows except for one of the "73 North RD" records. I hope I'm making some sense here.

My mind does not work very well in set-based operations, so I'm confused about how to do this. I would prefer a solution with an explanation of why it works. Since I'd like to be "taught how to fish", so to speak =)

Additional Information:

Upvotes: 3

Views: 311

Answers (1)

Marco
Marco

Reputation: 57583

A possible solution could be

SELECT c1.id, c1.name, c1.company, c1.addr1
FROM CustomerDB c1 LEFT JOIN CustomerDB c2
    ON (c1.addr1 = c2.addr1 and c1.id >= c2.id)
GROUP BY c1.addr1, c1.id, c1.name, c1.company
HAVING COUNT(*) <= 3​

The magic (nice definition you gave) is done by GROUP BY ... HAVING ... part.
Sure you know WHERE is used to filter data while selecting; well, HAVING is used in conjunction with GROUP BY to filter grouped data (so after grouping).
So I first group data (order is important) by address and then limit the count for each group to three.
Now let's talk about the inner part (you didn't ask this, but I thought it was good to provide you a complete example).
The ON part is used to tell the engine which fields are to be used to join tables; here I match tables on address and then I take records having an higher id (so, sorting ascending).

Upvotes: 3

Related Questions