Reputation: 43
...Spent several hours trying what not and researching this forum. Quite pessimistic at this point about the usefulness of Google Big Query (GBQ) for anything more than trivial queries, but here is one last desperate try, maybe someone has better ideas:
Let's say we have a COUNTRY table with average population weight(in kilograms) and height (in meters) per country as follows:
country | continent | weight | height | ============================================ US | America | 200 | 2.00 | Canada | America | 170 | 1.90 | France | Europe | 160 | 1.78 | Germany | Europe | 110 | 2.00 |
Let's say you want to pick out and live in the European country with "smallest" people, where you define the measure "smallness" as the weighted sum of body weight and height with some constant weights, such as 0.6 for body weight and 0.4 for body height.
In Oracle or MS SQL server this can be done elegantly and compactly by using analytic window functions such as rank() and row_number(), for example:
select country, combined_score from (select country ,( 0.6*rank(weight) over() + 0.4*rank(height) over() ) combined_score from country where continent = 'Europe') order by combined_score
Note that the ranking is done after the filtering for continent. The continent filter is dynamic (say input from a web form), so the ranking can not be pre-calculated and stored in the table in advance!
In GBQ there are no rank() , row_number() or over(). Even if you try some "poor man" hacks it is still not going to work because GBQ does not support correlated queries. Here are similar attempts by other people with some pretty unsatisfactory and inefficient results:
Any ideas how this can be done? I can even restructure the data to use nested records, if it helps. Thank you in advance!
Upvotes: 3
Views: 1258
Reputation: 59175
An equivalent for RANK in BigQuery is row_number().
For example, the top 5 contributors to Wikipedia, with row_number giving their place:
SELECT
ROW_NUMBER() OVER() row_number,
contributor_username,
count,
FROM (
SELECT contributor_username, COUNT(*) count,
FROM [publicdata:samples.wikipedia]
GROUP BY contributor_username
ORDER BY COUNT DESC
LIMIT 5)
Upvotes: 1
Reputation: 7046
In your specific example, I think you can compute the result without using RANK and OVER at all:
SELECT country, score
FROM (SELECT country, 0.6 * weight + 0.4 * height AS score
FROM t WHERE continent = 'Europe')
ORDER BY score;
However, I'm assuming that this is a toy example and that your real problem involves use of RANK more in line with your example query. In that case, BigQuery does not yet support analytic functions directly, but we'll consider this to be a feature request. :-)
Upvotes: 1