Ridhi
Ridhi

Reputation: 41

BiqQuery - How to find unique ids overlapping for multiple categories in a column?

I am very new to BigQuery and standard SQL. I might not be able to figure out the correct approach to solve a problem. Please help me out.

I have a Country Column and a ID column. Example shown below:

Country | ID
USA | id_1
USA | id_2
USA | id_1
UK | id_1
UK | id_1
UK | id_2
UK | id_3
AUS | id_3
AUS | id_4
AUS | id_2

I want a resulting column such as:

Output 1: UNIQUE OVERLAPPING ID values across all the countries

Country | Unique_overlapping_ids
USA | 2
UK | 3
AUS | 2

Output 2: UNIQUE non-OVERLAPPING ID values across all the countries

Country | Non_Unique_overlapping_ids
USA | 0
UK | 0
AUS | 1

I have 88 distinct Countries and over 5M unique IDs

Please help me. Thank you for your time and patience.

Upvotes: 0

Views: 607

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173046

Below is for BigQuery Standard SQL

#standardSQL
select country, 
  count(distinct if(shared, id, null)) as Unique_overlapping_ids,
  count(distinct if(shared, null, id)) as Unique_non_overlapping_ids
from `project.dataset.table` 
join (
  select id, count(distinct country) > 1 shared
  from `project.dataset.table`
  group by id
) using(id)
group by country   

if to apply to sample data from your question - output is

enter image description here

Upvotes: 1

Related Questions