In BigQuery, how to sort values separately for each column

Question

Is it possible to sort each column separately in a table in bigquery? We have a table of statistics and we would like to sort each column separately within partitions. We have 3 stats stat1, stat2, stat3 and gender is our partition key:

with
  stats as (
    select 'male' as gender, .60 as stat1, 23 as stat2, .10 as stat3 union all
    select 'male' as gender, .62 as stat1, 28 as stat2, .12 as stat3 union all
    select 'male' as gender, .57 as stat1, 21 as stat2, .16 as stat3 union all
    select 'male' as gender, .51 as stat1, 18 as stat2, .14 as stat3 union all
    select 'male' as gender, .53 as stat1, 17 as stat2, .18 as stat3 union all
    select 'male' as gender, .46 as stat1, 31 as stat2, .08 as stat3 union all
    select 'male' as gender, .49 as stat1, 32 as stat2, .07 as stat3 union all
    select 'male' as gender, .55 as stat1, 40 as stat2, .23 as stat3 union all
    select 'male' as gender, .68 as stat1, 41 as stat2, .33 as stat3 union all
    select 'male' as gender, .56 as stat1, 36 as stat2, .32 as stat3 union all
    select 'female' as gender, .80 as stat1, 32 as stat2, .42 as stat3 union all
    select 'female' as gender, .82 as stat1, 24 as stat2, .43 as stat3 union all
    select 'female' as gender, .73 as stat1, 26 as stat2, .33 as stat3 union all
    select 'female' as gender, .85 as stat1, 27 as stat2, .55 as stat3 union all
    select 'female' as gender, .91 as stat1, 29 as stat2, .53 as stat3 union all
    select 'female' as gender, .88 as stat1, 13 as stat2, .51 as stat3 union all
    select 'female' as gender, .86 as stat1, 38 as stat2, .49 as stat3 union all
    select 'female' as gender, .77 as stat1, 35 as stat2, .40 as stat3 union all
    select 'female' as gender, .74 as stat1, 15 as stat2, .58 as stat3 union all
    select 'female' as gender, .95 as stat1, 17 as stat2, .59 as stat3
  ),

  -- we create rank columns
  stats_with_ranks as (
    select
      *
      ,row_number() over (partition by gender) as rank
      ,rank() over (partition by gender order by stat1 desc) as stat1rk
      ,rank() over (partition by gender order by stat2 desc) as stat2rk
      ,rank() over (partition by gender order by stat3 desc) as stat3rk
    from stats
  )

select * from stats_with_ranks where gender = 'female' order by rank asc

This is not the correct table for our problem. We'd like each metric to be sorted within its own column such that its values correspond to the singular rank column. For example, the highest value for each of stat1, stat2, stat3 would be in the same row as rank = 1. Something like this:

We came up with the following approach:

select
  a.gender
  ,a.rank
  ,b.stat1 as stat1
  ,c.stat2 as stat2
  ,d.stat3 as stat3
from stats_with_ranks as a
left join stats_with_ranks as b on a.gender = b.gender and a.rank = b.stat1rk
left join stats_with_ranks as c on a.gender = c.gender and a.rank = c.stat2rk
left join stats_with_ranks as d on a.gender = d.gender and a.rank = d.stat3rk
order by gender asc, rank asc

however this approach doesn't scale with our larger data set, where there are 200+ stats in a table that we are ranking. We very quickly get the error Resources exceeded during query execution: Not enough resources for query planning - too many subqueries or query is too complex. when there are 200+ of these left joins. This is why we are looking for a solution that sorts within columns.

Mikhail Berlyant · Accepted Answer

We don't mind listing out all 200+ stats so long as the query resources aren't exceeded ...

create temp function  extract_keys(input string) returns array language js as "return Object.keys(JSON.parse(input));";
create temp function  extract_values(input string) returns array language js as "return Object.values(JSON.parse(input));";
select * from (
  select gender, col, cast(stat as float64) stat, row_number() over(partition by gender, col order by cast(stat as float64) desc) rank
  from stats t, 
  unnest([struct(to_json_string((select as struct * except(gender) from unnest([t]))) as json)]),
  unnest(extract_keys(json)) col with offset
  join unnest(extract_values(json)) stat with offset
  using(offset)
)
pivot (any_value(stat) for col in ('stat1','stat2','stat3'))

if applied to sample data in your question - output is

In BigQuery, how to sort values separately for each column

Answers (2)

Related Questions