user2264178
user2264178

Reputation: 43

weighted ranking/ combined score in Google Big Query

...Spent several hours trying what not and researching this forum. Quite pessimistic at this point about the usefulness of Google Big Query (GBQ) for anything more than trivial queries, but here is one last desperate try, maybe someone has better ideas:

Let's say we have a COUNTRY table with average population weight(in kilograms) and height (in meters) per country as follows:

  country   |  continent   | weight | height |
  ============================================
  US        | America      | 200    | 2.00   |
  Canada    | America      | 170    | 1.90   |
  France    | Europe       | 160    | 1.78   |
  Germany   | Europe       | 110    | 2.00   |

Let's say you want to pick out and live in the European country with "smallest" people, where you define the measure "smallness" as the weighted sum of body weight and height with some constant weights, such as 0.6 for body weight and 0.4 for body height.

In Oracle or MS SQL server this can be done elegantly and compactly by using analytic window functions such as rank() and row_number(), for example:

select country, combined_score
from (select 
        country
       ,( 0.6*rank(weight) over()  + 0.4*rank(height) over() ) combined_score
from country
where continent = 'Europe')
order by combined_score

Note that the ranking is done after the filtering for continent. The continent filter is dynamic (say input from a web form), so the ranking can not be pre-calculated and stored in the table in advance!

In GBQ there are no rank() , row_number() or over(). Even if you try some "poor man" hacks it is still not going to work because GBQ does not support correlated queries. Here are similar attempts by other people with some pretty unsatisfactory and inefficient results:

BigQuery SQL running totals

Row number in BigQuery?

Any ideas how this can be done? I can even restructure the data to use nested records, if it helps. Thank you in advance!

Upvotes: 3

Views: 1258

Answers (2)

Felipe Hoffa
Felipe Hoffa

Reputation: 59175

An equivalent for RANK in BigQuery is row_number().

For example, the top 5 contributors to Wikipedia, with row_number giving their place:

SELECT
  ROW_NUMBER() OVER() row_number,
  contributor_username,
  count,
FROM (
  SELECT contributor_username, COUNT(*) count,
  FROM [publicdata:samples.wikipedia]
  GROUP BY contributor_username
  ORDER BY COUNT DESC
  LIMIT 5)

Upvotes: 1

Jeremy Condit
Jeremy Condit

Reputation: 7046

In your specific example, I think you can compute the result without using RANK and OVER at all:

SELECT country, score
FROM (SELECT country, 0.6 * weight + 0.4 * height AS score
      FROM t WHERE continent = 'Europe')
ORDER BY score;

However, I'm assuming that this is a toy example and that your real problem involves use of RANK more in line with your example query. In that case, BigQuery does not yet support analytic functions directly, but we'll consider this to be a feature request. :-)

Upvotes: 1

Related Questions