sha
sha

Reputation: 643

BigQuery array intersection across all rows

I have the following data:

col1
[1,2,3]
[1,2]
[2,3]

I want the following output

ans
[2]

In other words, I want the intersection of arrays in all rows. Any pointers?

Upvotes: 0

Views: 922

Answers (3)

Ricco D
Ricco D

Reputation: 7287

Another approach could be seen below. This is assuming that your arrays does not have duplicate values.

cte groups by array value, and gets all the indices that contain this value and counts them. It also assigns row_number for each row. This can now be filtered where the maximum row_number = count of indices that contain the value.

with sample_data as (
    select [1,2,3] as col1
    union all select [1,2] as col1
    union all select [2,3] as col1
),
cte as (
select 
  col1[safe_offset(index)] as arr_value,
  array_length(array_agg(index)) as exist_in_n_rows, -- get indices where values appeared then count them
  row_number() over () as rn
from sample_data,
  unnest(generate_array(0,array_length(col1)-1)) as index
group by 1
)
select 
array_agg(intersection) as ans
from (
  select 
    arr_value as intersection,
  from cte
    where true
    qualify max(rn) over () = exist_in_n_rows
)

Output:

enter image description here

NOTE: If you have duplicate values in the array, it is much better to de duplicate them first prior to running the query above.

Upvotes: 2

Samuel
Samuel

Reputation: 3528

Intersection of array of all rows. There is the need to unnest the array and group by each element and count how often it is in a row present.

Lets start with generating your datset in table tbl.

Then in a helper table, we generate a row_number, starting with one for each row. We also count the maximum numbers of rows.

In the last step, we unnest the col1 column with all the array elements and call them y. We group by y and count for each unique entry of row_num in the new column ref. having does a comparing of the ref to the total amount of rows in the beginning and filters only the element which are present in all arrays.

With tbl as (select a.x as col1 from unnest( [struct([1,2,3,3] as x),struct([1,2]),struct([2,3])] )as a),
helper as ( select col1,row_number() over () as row_num,count(col1) over () as counts  from tbl)
select y,count(distinct row_num) as ref, any_value(counts) ref_target  
from helper , unnest(col1) y 
group by 1
having ref=ref_target

Upvotes: 1

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172994

Consider below approach

select array_agg(item) ans from (
  select distinct item
  from your_table t, t.col1 item
  qualify count(distinct format('%t', col1)) over(partition by item) 
    = count(distinct format('%t', col1)) over()
)

Upvotes: 2

Related Questions