user1951
user1951

Reputation: 1541

SQL: Data Cleaning

I am facing a problem which I do not know how to categorize. So, pardon me for the generic title. I have a dataset like:

Table1: Column1, Column2, Column3. According to my business logic, for a pair of 'Column1 Column2', the Column3 can have only one unique value. So below table is a problematic one because of the second entry:

Table1
Column1  Column2  Column3
A1       B1       R
A1       B1       O   << ERROR! for A1-B1 pair only one value on column3 is accepted
A2       B2       R
A2       B3       J
A3       B3       K
A4       B5       K

From above table I would like to find the problematic entries:

A1       B1       R
A1       B1       O

Thanks in advance for your help !

Upvotes: 0

Views: 80

Answers (1)

Steven B.
Steven B.

Reputation: 342

Using your example column names, you can run the following query to just see the Column1/Column2 pairs that have more than 1 value in Column 3.

SELECT Column1, Column2, COUNT(DISTINCT Column3) as Column3
FROM Table1
GROUP BY Column1, Column2
HAVING COUNT(DISTINCT Column3) > 1

You can omit the HAVING line to see the complete list of Column1/Column2 pairs.

Upvotes: 2

Related Questions