Reputation: 35
Below is a row from a SnowFlake query. This is the only row in the result with this information (i.e., this row is unique).
ID ACCOUNT_NUMBER DATE_1 DATE_2
123 347 2017-10-19 2017-10-29
I ran a GROUP BY like below to count the number of rows in each group. I got 3 for the above row. Shouldn't I get 1?
SELECT DISTINCT ID, ACCOUNT_NUMBER, DATE_1, DATE_2, COUNT(*)
FROM TABLE GROUP BY 1, 2, 3, 4;
ID ACCOUNT_NUMBER DATE_1 DATE_2 COUNT
123 347 2017-10-19 2017-10-29 3
I expected to see count of 1 for this row, but I got 3.
Upvotes: 1
Views: 469
Reputation: 175596
The result is correct. The DISTINCT
is applied after the grouping and has not effect in provided query.
Docs Typically, a SELECT statement’s clauses are evaluated in the order shown below:
- From
- Where
- Group by
- Having
- Window
- QUALIFY
- Distinct
- Order by
- Limit
Both below queries produces the same result:
SELECT DISTINCT ID, ACCOUNT_NUMBER, DATE_1, DATE_2, COUNT(*)
FROM TAB
GROUP BY 1, 2, 3, 4;
SELECT ID, ACCOUNT_NUMBER, DATE_1, DATE_2, COUNT(*)
FROM TAB
GROUP BY 1, 2, 3, 4;
To apply DISTINCT it should be provided before grouping(subquery)
SELECT ID, ACCOUNT_NUMBER, DATE_1, DATE_2, COUNT(*)
FROM (SELECT DISTINCT ID, ACCOUNT_NUMBER, DATE_1, DATE_2 FROM TAB)
GROUP BY 1, 2, 3, 4;
or as a part of aggregate function:
SELECT ID, ACCOUNT_NUMBER, DATE_1, DATE_2,
COUNT(DISTINCT ID, ACCOUNT_NUMBER, DATE_1, DATE_2)
FROM TAB
GROUP BY 1, 2, 3, 4;
For sample data:
CREATE OR REPLACE TABLE TAB(ID INT,
ACCOUNT_NUMBER INT,
DATE_1 TEXT,
DATE_2 TEXT)
AS
SELECT 123, 347, '2017-10-19', '2017-10-29' UNION ALL
SELECT 123, 347, '2017-10-19', '2017-10-29' UNION ALL
SELECT 123, 347, '2017-10-19', '2017-10-29';
Upvotes: 2