SELECT all columns from distinct two columns

Question

I need to SELECT * where distinct 'Rua' with different 'CP' in the following DB:

id      Rua                  Local           Conc       CP
81143   dos moinhos          Rio Tinto       Gondomar   0123
81142   dos Moinhos          Rio Tinto       Gondomar   4435
81141   dos Moinhos          Rio Tinto       Gondomar   4435

With the following query, i am able to obtain the two columns:

SELECT Rua, CP 
FROM  Codigo
GROUP BY Rua, CP
HAVING COUNT(*) = 1

But i want all the columns. SELECT * returns "Column 'Codigo.id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause."

So i came upon this answer: How do I (or can I) SELECT DISTINCT on multiple columns?

I didn't go for the accepted answer, as i need a fast query (this will be used for AJAX search suggestions). I used the other highly upvoted answer. So, i created the following query:

SELECT * From Codigo
WHERE  (Rua,CP) IN (
   SELECT Rua, CP
   FROM   Codigo
   GROUP  BY Rua, CP
   HAVING COUNT(*) = 1 
);

That returns An expression of non-boolean type specified in a context where a condition is expected, near ',' error. Why it expects Rua after WHERE to be boolean? In the answer they used saleprice which don't seem boolean.

So my question is how can i select all columns including the rows with same Rua and CP, but those only once (DISTINCT)?

Pரதீப் · Accepted Answer

SQL Server does not support (a,b) in (val1,val2) syntax

Use EXISTS to check the pair

SELECT *
FROM   Codigo C1
WHERE  EXISTS (SELECT 1
               FROM   Codigo C2
               WHERE  C1.Rua = C2.Rua
                      AND C1.CP = C2.CP
               HAVING Count(*) = 1);

or use COUNT() Over() Window aggregate function to count the records for each Rua, CP combination and filter the group which is having count as 1 to find one Rua row per each different CP.

SELECT *
FROM   (SELECT *,
               Count(1)OVER(partition BY Rua, CP) AS cnt
        FROM   Codigo C1) A
WHERE  cnt = 1

This will be my preferred approach since it is much efficient than using EXISTS the optimizer has to scan/seek the table only once

If you don't want duplicate records then use ROW_NUMBER()

SELECT *
FROM   (SELECT *,
               Row_Number()OVER(partition BY Rua, CP Order by id) AS RN
        FROM   Codigo C1) A
WHERE  RN = 1

I have ordered the duplicate records using Id column and picked the first record out of duplicates.

SELECT all columns from distinct two columns

Answers (2)

Related Questions