Timogavk
Timogavk

Reputation: 869

How filter rows by matched values using BigQuery?

I have a table in BigQuery

SELECT 1 as big_id, 1 as temp_id, '101' as names
      UNION ALL SELECT 1,1, 'z3Awwer', 
      UNION ALL SELECT 1,1, 'gA1sd03',
      UNION ALL SELECT 1,2, 'z3Awwer', 
      UNION ALL SELECT 1,2, 'gA1sd03',
      UNION ALL SELECT 1,3, 'gA1sd03',
      UNION ALL SELECT 1,3, 'sAs10sdf4',
      UNION ALL SELECT 1,4, 'sAs10sdf4',
      UNION ALL SELECT 1,5, 'Adf105', 
      UNION ALL SELECT 2,1, 'A1sdf02',
      UNION ALL SELECT 2,1, '345A103',
      UNION ALL SELECT 2,2, '345A103',
      UNION ALL SELECT 2,2, 'A1sd04',
      UNION ALL SELECT 2,3, 'A1sd04',
      UNION ALL SELECT 2,4, '6_0Awe105'

I want to filter it by temp_id if all names of one temp_id included in some another temp_id in partition by big_id window. For example I do not need to select all rows where temp_id = 2 because all names of temp_id = 2 included in temp_id = 1. As well as need to keep all rows of temp_id = 1 because this names range covers names range of temp_id = 2

So expected output:

SELECT 1 as big_id, 1 as temp_id, '101' as names
      UNION ALL SELECT 1,1, 'z3Awwer', 
      UNION ALL SELECT 1,1, 'gA1sd03',     
      UNION ALL SELECT 1,3, 'gA1sd03',
      UNION ALL SELECT 1,3, 'sAs10sdf4',     
      UNION ALL SELECT 1,5, 'Adf105', 
      UNION ALL SELECT 2,1, 'A1sdf02',
      UNION ALL SELECT 2,1, '345A103',
      UNION ALL SELECT 2,2, '345A103',
      UNION ALL SELECT 2,2, 'A1sd04',      
      UNION ALL SELECT 2,4, '6_0Awe105'

How can I make it using BigQuery?

Upvotes: 1

Views: 174

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

Below is for BigQuery Standard SQL

#standardsql
with temp as (
  select big_id, temp_id, array_agg(names) names
  from `project.dataset.table`
  group by big_id, temp_id
)
select big_id, temp_id, names 
from (
  select big_id, temp_id, any_value(names) names 
  from (
    select t1.*,
      ( select count(1)
        from t1.names name
        join t2.names name
        using(name)
        where t1.temp_id != t2.temp_id
      ) = array_length(t1.names) as flag
    from temp t1 
    join temp t2
    using (big_id)
  )
  group by big_id, temp_id
  having countif(flag) = 0
), unnest(names) names    

If to apply above to sample data from your question - the output is

enter image description here

Upvotes: 2

Related Questions