Surya Krishna
Surya Krishna

Reputation: 19

BIGQUERY - Query to find all UUID's which are having unique SSN and DOB values from joining 2 tables based on UUID column

Let's say there are 2 tables.

Table 1 : match
Table 2 : normalized

MATCH TABLE : COLUMN NAME : UUID

NORMALIZED TABLE : COLUMN NAMES : UUID, SSN, DOB

In MATCH table there is a column "UUID" and in NORMALIZED table there are columns 'UUID', SSN and DOB. After joining these two tables, need to fetch the uuid which are having unique SSN and DOB

Need to fetch the matched UUIDs records on the below tables by joining 2 tables

MATCH TABLE : UUID Column

Record # uuid
1 123456789
2 123456789
3 123456789
4 123456789
5 998764556
6 112233445

NORMALIZED TABLE : UUID, SSN, DOB columns

Record # uuid ssn dob comment
1 123456789 54321 03/08/1986 SSN is repeated in record #3
2 123456789 98765 03/08/1986 DOB is repeated in record #1
3 123456789 54321 03/08/1986 SSN is repeated in record #1
4 123456789 77654 10/10/2000 SSN is repeated in record #5
5 998764556 77654 11/11/1960 SSN is repeated in record #4
6 112233445 66546 07/14/1970 SSN and DOB is unique

OUTPUT : Need to display "All UUID's which have SSN and DOB are unique".

Record # uuid
6 112233445

This record should display in the output as this UUID has both SSN and DOB are unique from the table.

I have tried with the below queries:

Scenario 1 : Find all UUIDs with Unique SSN

QUERY : select * from MATCH_TABLE where UUID in (select DISTINCT UUID from NORMALIZED_TABLE A,MATCH_TABLE B where A.uuid=B.uuid and A.SSN is not null AND A.SSN in (SELECT C.SSN from NORMALIZED_TABLE C GROUP BY C.SSN having count(1)<2)

Scenario 2 : Find all UUIDs with Unique DOB

QUERY: select * from MATCH_TABLE where UUID in (select DISTINCT UUID from NORMALIZED_TABLE A,MATCH_TABLE B where A.uuid=B.uuid and A.DOB is not null AND A.DOB in (SELECT C.DOB from NORMALIZED_TABLE C GROUP BY C.DOB having count(1)<2)

Scenario 3 : Find all UUIDs with Unique SSN and DOB.

QUERY: select * from MATCH where UUID in (select distinct a.uuid from NORMALIZED a,MATCH b where a.uuid=b.uuid and a.ssn is not null and a.dob is not null and CONCAT(a.ssn,a.dob) in (select concat(c.ssn,c.dob) from NORMALIZED c group by c.ssn,c.dob having count(1)<2))

I'm not getting the correct output. Can anyone please help me with the query.

Upvotes: 1

Views: 383

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173046

Another solution would be

select * from (
  select as value any_value(t) from `project.dataset.table` t 
  group by ssn having count(1) = 1
) intersect distinct 
select * from (
  select as value any_value(t) from `project.dataset.table` t 
  group by dob having count(1) = 1
)   

if applied to sample data in your question - output is

enter image description here

Upvotes: 1

Sergey Geron
Sergey Geron

Reputation: 10212

Try windowed count(*) over(partition by column_name):

with mytable as (
  select 1 as record, 123456789 as uuid, 54321 as ssn, '03/08/1986' as dob union all
  select 2, 123456789, 98765, '03/08/1986' union all
  select 3, 123456789, 54321, '03/08/1986' union all
  select 4, 123456789, 77654, '10/10/2000' union all
  select 5, 998764556, 77654, '11/11/1960' union all
  select 6, 112233445, 66546, '07/14/1970'
)
select *
from (
  select 
    *,
    count(*) over (partition by ssn) as ssn_count,
    count(*) over (partition by dob) as dob_count
  from mytable
)
where ssn_count = 1 and dob_count = 1

enter image description here

Upvotes: 0

Related Questions