Reputation: 77
I have a table with the following columns:
items ARRAY<STRUCT<label STRING, counter INTEGER>>
explore BOOLEAN
For each record I would like to choose the label with the highest counter, and then count explore
on each unique label.
Ideally I would like to run something like:
SELECT FIRST_VALUE(items.label) OVER (ORDER BY items.counter DESC) as label,
COUNT(explore) as explore
FROM my_table
GROUP BY 1
If this is the data in my table:
explore items
1 [['A',1],['B',3]]
1 [['B',1]]
0. [['C',2],['D',1]]
Then I would like to get:
label explore
'B' 2
'C' 1
Upvotes: 2
Views: 718
Reputation: 172954
Consider below approach
select ( select label from t.items
order by counter desc limit 1
) label,
count(*) explore
from your_table t
group by label
if applied to sample data in your question
with your_table as (
select 1 explore, [struct('A' as label, 1 as counter), struct('B' as label, 3 as counter) ] items union all
select 1, [struct('B', 1)] union all
select 0, [struct('C', 2), struct('D', 1) ]
)
output is
Upvotes: 1
Reputation: 7277
Using your sample data, consider below approach.
with data as (
select 1 as explore, [STRUCT( 'A' as label, 1 as counter), STRUCT( 'B' as label, 3 as counter) ] as items,
union all select 1 as explore, [STRUCT( 'B' as label, 1 as counter)] as items,
union all select 0 as explore, [STRUCT( 'C' as label, 2 as counter), STRUCT( 'D' as label, 1 as counter) ] as items
),
add_row_num as (
SELECT
explore,
items,
row_number() over (order by explore desc) as row_number
FROM data
),
get_highest_label as (
select
explore,
row_number,
label,
counter,
first_value(label) over (partition by row_number order by counter desc) as highest_label_per_row
from add_row_num, unnest(items)
),
-- https://stackoverflow.com/questions/36675521/delete-duplicate-rows-from-a-bigquery-table (REMOVE DUPLICATE)
remove_dups as (
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY row_number) as new_row_number
FROM get_highest_label
)
select
highest_label_per_row,
count(highest_label_per_row) as explore,
from remove_dups
where new_row_number = 1
group by highest_label_per_row
Output:
Upvotes: 0