Czakan
Czakan

Reputation: 13

Counting unique lines with single select in SQL

So I have these two tables, one with duplicates and second one with data that is supposed to be unique. The thing is that some of the duplicates were not detected in the first select, and I cannot change it. New select must be used. To divide duplicates from unique data I made this simple select:

SELECT *, COUNT(Ref) as number
FROM [DB1].[DBO].[TB1] t
     FULL outer join [DB1].[DBO].[DUPLICATES] d
     ON t.Ref = d.Ref and t.ID = d.ID
     WHERE DUP_DATE = "2020-06-11"
GROUP BY t.Ref, t.ID,
HAVING COUNT(t.Ref) = 1

This is suposed to give me back the data that has no duplicated information in column Ref. While trying to exe this select it gives me back this error:

Msg 144, Level 15, State 1, Line 7 Cannot use an aggregate or a subquery in an expression used for the group by list of a GROUP BY clause

As I'm not familiar with T-SQL I would be grateful for any support provided. I tried some of the previously proposed solutions (here on stack), yet none of them worked for me.

EDIT: Here is the sample:

TB1
| REF    | DUP_DATE  |
| ------ | --------- |
| 36676  | Chicago   |
| 24671  | Singapur  |
| 89783  | Rotterdam |
| 85476  | Warsaw    |
| 12567  | Sydney    |
| 98437  | NYC       |

TB Duplicates
| REF   | DUP_DATE    |
| ----- | ----------- |
| 24671 | Mexico City |
| 89783 | Oxford      |

The output I want to get:

| REF   | DUP_DATE | Number |
| ----- | -------- | ------ |
| 36676 | Chicago  |    1   |
| 85476 | Warsaw   |    1   |
| 12567 | Sydney   |    1   |
| 98437 | NYC      |    1   |

Upvotes: 1

Views: 82

Answers (2)

Joel Coehoorn
Joel Coehoorn

Reputation: 415725

You can't use * with GROUP BY. Limit the SELECT clause to items used in the GROUP BY clause and aggregate/grouping functions like SUM() and AVG().

Also, the COUNT(t.Ref) = 1 expression, in addition to the type, should probably be in a HAVING clause and the double quotes around the date literal don't mean what you think in SQL.

Finally, this is the smallest of nitpicks, but date-only literals in Sql Server should use the unseparated variant of ISO8601.

SELECT t.Ref, t.ID, COUNT(Ref) as number
FROM [DB1].[DBO].[TB1] t
FULL JOIN [DB1].[DBO].[DUPLICATES] d
     ON t.Ref = d.Ref and t.ID = d.ID
WHERE DUP_DATE = '20200611'
GROUP BY t.Ref, t.ID
HAVING COUNT(t.Ref) = 1

Upvotes: 2

Srijon Chakraborty
Srijon Chakraborty

Reputation: 2154

I think DUP_DATE column should be datetime type however, it is look like a varchar column. That's why I have used DUP_DATE as Varchar. Finally, here is a solution with exact output =>

DECLARE @TB1 Table (REF INT, DUP_DATE Varchar(50));
DECLARE @TBDuplicates Table (REF INT, DUP_DATE Varchar(50));
INSERT INTO @TB1 VALUES (36676,'Chicago');
INSERT INTO @TB1 VALUES (24671,'Singapur');
INSERT INTO @TB1 VALUES (89783,'Rotterdam');
INSERT INTO @TB1 VALUES (85476,'Warsaw');
INSERT INTO @TB1 VALUES (12567,'Sydney');
INSERT INTO @TB1 VALUES (98437,'NYC');

INSERT INTO @TBDuplicates VALUES (24671,'Mexico City');
INSERT INTO @TBDuplicates VALUES (89783,'Oxford');

SELECT t.REF,t.DUP_DATE,COUNT(*) Number FROM @TB1 t
LEFT JOIN @TBDuplicates t1 ON t.REF=t1.REF
WHERE t1.REF IS NULL
GROUP BY  t.REF,t.DUP_DATE
ORDER BY t.DUP_DATE,t.REF

Upvotes: 0

Related Questions