Huzaifa
Huzaifa

Reputation: 1171

SQL query to find False Negatives w.r.t Data Matching or Entity Resolution

Suppose I have two different ID's assigned for same record. For example

RecordID | ID1 | ID2
--------------------
1        | X   | A
2        | X   | B
3        | Y   | C
4        | Y   | C
5        | Y   | D
6        | Z   | E
7        | Z   | E

Now, I want to get the records where ID1 is assigned to the same value where as ID2 is assigned to a different value.

For example, I want to get:

1, X, A
2, X, B

Here ID1 assigned it X where as ID2 assigned it A and B, two different values.

Is it possible to write a query in SQL or SQL server that will return such records?

Upvotes: 2

Views: 375

Answers (4)

Andriy M
Andriy M

Reputation: 77677

If this is SQL Server 2005+:

WITH minmax AS (
  SELECT
    *,
    minID2 = MIN(ID2) OVER (PARTITION BY ID1),
    maxID2 = MAX(ID2) OVER (PARTITION BY ID1)
  FROM atable
)
SELECT
  RecordID,
  ID1,
  ID2
FROM minmax
WHERE minID2 <> maxID2
;

In the minmax CTE, two more columns are added which hold minimum and maximum ID2 for every group of rows with the same ID1. The main query returns only those rows where the corresponding minimum ID2 doesn't match the maximum ID2.

Upvotes: 1

fthiella
fthiella

Reputation: 49049

I think you are looking for this:

SELECT RecordID, ID1, ID2
FROM yourtable
WHERE ID1 IN (SELECT ID1
              FROM yourtable
              GROUP BY ID1
              HAVING COUNT(DISTINCT ID2)>1);

See fiddle here.

Upvotes: 1

sgeddes
sgeddes

Reputation: 62841

Assuming I'm understanding your requirements, I think all you need is an INNER JOIN:

SELECT DISTINCT T.*
FROM YourTable T
  JOIN YourTable T2 ON T.ID1 = T2.ID1 AND T.ID2 <> T2.ID2

And here is the SQL Fiddle.

Please note, in this example it returns all rows from X and Y. X because of A and B; Y because of C and D. Is this correct?

Good luck.

Upvotes: 1

Patashu
Patashu

Reputation: 21773

You need to use a subquery where, for each row, you poke through the table and see if any other rows match a certain criteria related to it.

pseudo sql:

select 
      t1.id, 
      t1.variable, 
      t1.value 
   from 
      table t1 
   where 
      exists ( select 1 
                  from t2 
                  where t2.id != t1.id 
                    and t2.variable == t1.variable 
                    and t2.value != t1.value)

Upvotes: 1

Related Questions