Reputation: 643
I have the following data:
col1 |
---|
[1,2,3] |
[1,2] |
[2,3] |
I want the following output
ans |
---|
[2] |
In other words, I want the intersection of arrays in all rows. Any pointers?
Upvotes: 0
Views: 922
Reputation: 7287
Another approach could be seen below. This is assuming that your arrays does not have duplicate values.
cte
groups by array value, and gets all the indices that contain this value and counts them. It also assigns row_number for each row. This can now be filtered where the maximum row_number = count of indices that contain the value.
with sample_data as (
select [1,2,3] as col1
union all select [1,2] as col1
union all select [2,3] as col1
),
cte as (
select
col1[safe_offset(index)] as arr_value,
array_length(array_agg(index)) as exist_in_n_rows, -- get indices where values appeared then count them
row_number() over () as rn
from sample_data,
unnest(generate_array(0,array_length(col1)-1)) as index
group by 1
)
select
array_agg(intersection) as ans
from (
select
arr_value as intersection,
from cte
where true
qualify max(rn) over () = exist_in_n_rows
)
Output:
NOTE: If you have duplicate values in the array, it is much better to de duplicate them first prior to running the query above.
Upvotes: 2
Reputation: 3528
Intersection of array of all rows. There is the need to unnest the array and group by each element and count how often it is in a row present.
Lets start with generating your datset in table tbl
.
Then in a helper
table, we generate a row_number, starting with one for each row. We also count the maximum numbers of rows.
In the last step, we unnest the col1
column with all the array elements and call them y
. We group by y
and count for each unique entry of row_num
in the new column ref
. having
does a comparing of the ref
to the total amount of rows in the beginning and filters only the element which are present in all arrays.
With tbl as (select a.x as col1 from unnest( [struct([1,2,3,3] as x),struct([1,2]),struct([2,3])] )as a),
helper as ( select col1,row_number() over () as row_num,count(col1) over () as counts from tbl)
select y,count(distinct row_num) as ref, any_value(counts) ref_target
from helper , unnest(col1) y
group by 1
having ref=ref_target
Upvotes: 1
Reputation: 172994
Consider below approach
select array_agg(item) ans from (
select distinct item
from your_table t, t.col1 item
qualify count(distinct format('%t', col1)) over(partition by item)
= count(distinct format('%t', col1)) over()
)
Upvotes: 2