Dinho
Dinho

Reputation: 724

How to group by a column with duplicate values while pulling all associated columns in table with SQL?

I have a table with 5 columns:

Party_ID | Site_ID | Party_Name | Group_Num | Code
---------+---------+------------+-----------+-----
123         A3D       MSFT        092079177   04
212         B5C       MSFT        092079177   05
313         S4F       MSFT        828539556   08
314         G3B       MSFT        828434232   09
515         KM9       MSFT        022901486   10
616         NM5       MSFT        022901486   09

DESIRED OUTPUT:

Party_ID | Site_ID | Party_Name | Group_Num | Code | Count
---------+---------+------------+-----------+------+--------
123         A3D       MSFT        092079177   04      12
212         B5C       MSFT        092079177   05      2
313         S4F       MSFT        828539556   08      3
314         G3B       MSFT        828434232   09      1 
515         KM9       MSFT        022901486   10      2 

I would like to group by 'Group_Num' which is a 9-digit value per row while pulling all associated fields data with it as a new table. There are duplicate values in Group_Num but are paired to unique values in other fields.

New to SQL and using Snowflake. I currently have this to count number of occurrences Group_Num has given Party_ID in query below:

SELECT * 
FROM F1_TABLE 
GROUP BY PARTY_ID
HAVING COUNT(DISTINCT Group_Num) > 1)
ORDER BY COUNT(DISTINCT Group_Num) DESC;

Upvotes: 0

Views: 4521

Answers (2)

NickW
NickW

Reputation: 9768

Not sure what exactly you are trying to achieve but if all you want is the 1st table with a count of Group Num appended to it then just do the count in a subquery and join to it e.g.

SELECT T1.*, T2.COUNTER
FROM F1_TABLE T1
INNER JOIN (SELECT Group_Num, COUNT(1) AS COUNTER
            FROM F1_TABLE
            GROUP BY Group_Num) AS T2
ON T1.GROUP_NUM = T2.GROUP_NUM

If you are trying to dedupe the main table in some way, as well, then do a SELECT DISTINCT or GROUP BY

Upvotes: 0

Mike Walton
Mike Walton

Reputation: 7339

So, not sure if you want to end up with a completely deduped table or not, but this window function + QUALIFY allow you to choose a record to keep for each dupe:

SELECT *
FROM F1_TABLE
QUALIFY row_number() OVER (PARTITION BY Group_Num ORDER BY Party_ID) = 1;

This example chooses the first Party_ID, but you can choose whatever you want (or let Snowflake choose for you by excluding the ORDER BY clause.

If this isn't what you are looking for, can you please provide what your resulting table would look like in your question?

Upvotes: 3

Related Questions