Canovice
Canovice

Reputation: 10173

In BigQuery, how to sort values separately for each column

Is it possible to sort each column separately in a table in bigquery? We have a table of statistics and we would like to sort each column separately within partitions. We have 3 stats stat1, stat2, stat3 and gender is our partition key:

with
  stats as (
    select 'male' as gender, .60 as stat1, 23 as stat2, .10 as stat3 union all
    select 'male' as gender, .62 as stat1, 28 as stat2, .12 as stat3 union all
    select 'male' as gender, .57 as stat1, 21 as stat2, .16 as stat3 union all
    select 'male' as gender, .51 as stat1, 18 as stat2, .14 as stat3 union all
    select 'male' as gender, .53 as stat1, 17 as stat2, .18 as stat3 union all
    select 'male' as gender, .46 as stat1, 31 as stat2, .08 as stat3 union all
    select 'male' as gender, .49 as stat1, 32 as stat2, .07 as stat3 union all
    select 'male' as gender, .55 as stat1, 40 as stat2, .23 as stat3 union all
    select 'male' as gender, .68 as stat1, 41 as stat2, .33 as stat3 union all
    select 'male' as gender, .56 as stat1, 36 as stat2, .32 as stat3 union all
    select 'female' as gender, .80 as stat1, 32 as stat2, .42 as stat3 union all
    select 'female' as gender, .82 as stat1, 24 as stat2, .43 as stat3 union all
    select 'female' as gender, .73 as stat1, 26 as stat2, .33 as stat3 union all
    select 'female' as gender, .85 as stat1, 27 as stat2, .55 as stat3 union all
    select 'female' as gender, .91 as stat1, 29 as stat2, .53 as stat3 union all
    select 'female' as gender, .88 as stat1, 13 as stat2, .51 as stat3 union all
    select 'female' as gender, .86 as stat1, 38 as stat2, .49 as stat3 union all
    select 'female' as gender, .77 as stat1, 35 as stat2, .40 as stat3 union all
    select 'female' as gender, .74 as stat1, 15 as stat2, .58 as stat3 union all
    select 'female' as gender, .95 as stat1, 17 as stat2, .59 as stat3
  ),

  -- we create rank columns
  stats_with_ranks as (
    select
      *
      ,row_number() over (partition by gender) as rank
      ,rank() over (partition by gender order by stat1 desc) as stat1rk
      ,rank() over (partition by gender order by stat2 desc) as stat2rk
      ,rank() over (partition by gender order by stat3 desc) as stat3rk
    from stats
  )

select * from stats_with_ranks where gender = 'female' order by rank asc

enter image description here

This is not the correct table for our problem. We'd like each metric to be sorted within its own column such that its values correspond to the singular rank column. For example, the highest value for each of stat1, stat2, stat3 would be in the same row as rank = 1. Something like this:

enter image description here

We came up with the following approach:

select
  a.gender
  ,a.rank
  ,b.stat1 as stat1
  ,c.stat2 as stat2
  ,d.stat3 as stat3
from stats_with_ranks as a
left join stats_with_ranks as b on a.gender = b.gender and a.rank = b.stat1rk
left join stats_with_ranks as c on a.gender = c.gender and a.rank = c.stat2rk
left join stats_with_ranks as d on a.gender = d.gender and a.rank = d.stat3rk
order by gender asc, rank asc

however this approach doesn't scale with our larger data set, where there are 200+ stats in a table that we are ranking. We very quickly get the error Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex. when there are 200+ of these left joins. This is why we are looking for a solution that sorts within columns.

Upvotes: 1

Views: 719

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

We don't mind listing out all 200+ stats so long as the query resources aren't exceeded ...

create temp function  extract_keys(input string) returns array<string> language js as "return Object.keys(JSON.parse(input));";
create temp function  extract_values(input string) returns array<string> language js as "return Object.values(JSON.parse(input));";
select * from (
  select gender, col, cast(stat as float64) stat, row_number() over(partition by gender, col order by cast(stat as float64) desc) rank
  from stats t, 
  unnest([struct(to_json_string((select as struct * except(gender) from unnest([t]))) as json)]),
  unnest(extract_keys(json)) col with offset
  join unnest(extract_values(json)) stat with offset
  using(offset)
)
pivot (any_value(stat) for col in ('stat1','stat2','stat3'))    

if applied to sample data in your question - output is

enter image description here

Upvotes: 1

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Below answer assumes that all stats column are of the same data type and have naming convention as statN

execute immediate (select '''
  select * from (
    select *, row_number() over(partition by gender, col order by stat desc) rank
    from stats
    unpivot (stat for col in (''' || col_list || ''')) 
  )
  pivot (any_value(stat) for col in (''' || val_list || '''))
  order by gender, rank
  '''
  from (
    select 
      string_agg('"stat' || pos || '"') val_list, 
      string_agg('stat' || pos) col_list
    from unnest(generate_array(1,3)) pos
  )
)          

if applied to sample data in your question (with all float64 values)

enter image description here

Upvotes: 2

Related Questions