user2264178
user2264178

Reputation: 43

Workaround needed for the incomplete implementation of window functions in Google Big Query (GBQ)

The recently announced (and long-awaited) window analytical functions in Google Big Query seems very incomplete. Below is a query that was supposed to calculate the combined weighted score of several (3 in the example) financial indicators of a company (this works in Oracle and MS SQL!):

select 
  (  percent_rank() over (order by cash_discount desc)*0.4
    +percent_rank() over (order by trailing_pe desc)*0.5
    +percent_rank() over (order by forward_pe desc)*0.5
  ) combined_weighted_score
 ,company_symbol, company_name
 from  workspace.company c

In GBQ we get: Error: Encountered " "OVER" "over "" at line 2.

I decided to research the issue further - here is an even simpler query:

select 
      percent_rank() over (order by cash_discount desc) r1
     ,percent_rank() over (order by trailing_pe desc) r2
     ,percent_rank() over (order by forward_pe desc) r3
     ,company_symbol, company_name
from  workspace.company c

which gets me this: Error: Multiple distinct OVER clauses are not yet supported. If more than one OVER clauses appear in a query they must be identical. Found AnalyticSpecs that differ in sort keys 0

So, GBQ appears to have very incomplete implementation of analytical window functions! Next, I tried some ugly hacks like:

select c.company_name, (r1.r*0.4+r2.r*0.5) combined_weighted_score
from [workspace.company] c
left join (select company_symbol, percent_rank() over (order by cash_discount desc) r from workspace.company) r1 
  on c.company_symbol=r1.company_symbol
left join (select company_symbol, percent_rank() over (order by trailing_pe desc) r from workspace.company) r2 
  on c.company_symbol=r2.company_symbol

well, still does not work:

Error: 5.1 - 0.0: A query cannot have multiple JOIN clauses

Any solutions? I know this is very hard to implement in classic SQL, without the help of properly implemented window functions. But GBQ annoucement sounded very promissing ...

Upvotes: 0

Views: 299

Answers (1)

Felipe Hoffa
Felipe Hoffa

Reputation: 59175

You almost got it!

Every OVER() clause needs the same predicate in the same SELECT statement, but you can JOIN 2 different tables - as you found out.

Only 2 tables can be joined, but you need 3. The easy workaround is to do a subquery and JOIN that one.

Instead of:

SELECT x.a, y.b, z.c
FROM x JOIN y ON x.j=y.j JOIN z ON y.j=z.j

you can do:

SELECT x.a, yz.b, yz.c
FROM x JOIN (
  SELECT y.b b, z.c c, y.j j
  FROM y JOIN z on y.j=z.j) yz
ON x.j=yz.j

(you might need to change the JOIN for a JOIN EACH)

Upvotes: 2

Related Questions