BigQuery: LAG() on UNNEST()

Question

I was looking at using LAG() to return a value from the previous row in an ARRAY of STRUCT, and it seems like the navigation window functions like LAG/LEAD are not supported with UNNEST.

It sounds suboptimal but you need to UNNEST all and (re)PARTITION BY !!

E.g. say you want the amount of the previous line:

EDIT: Schema details:

Invoices schema : docID STRING, lines STRUCT

Expected output: docID STRING, lines STRUCT

SELECT 
   docID, lineID, amount
   ,LAG(amount) OVER (PARTITION BY docID ORDER by lineID) prev_amount
FROM invoices, UNNEST(lines)

This is pretty slow. I wish we could somehow pass the ARRAY as window frame to use.

Any alternative ? I am using BigQuery but this may apply to other DBMS.

UPDATE: I have come up with below alternative using WITH OFFSET. Faster and maybe this is what OFFSET is meant to be used for, it's just ugly and could be simpler in my mind.

SELECT 
   docID, lineID, amount
   ,(SELECT amount FROM UNNEST(lines) WITH OFFSET AS pos2 WHERE pos2 = pos - 1) prev_amount
FROM invoices, UNNEST(lines) WITH OFFSET AS pos

yan-hic · Accepted Answer

Below alternative to Mikhail's answer, using WITH OFFSET instead, performs better (55s vs. 170s, over 50M rows)

#standardSQL
SELECT docID, ARRAY(
    SELECT AS STRUCT 
      l.*, 
      lines[SAFE_OFFSET(pos - 1)].amount prev_amount 
    FROM UNNEST(lines) l WITH OFFSET pos
    ) lines
FROM invoices

Note: this will not work as-is if ORDER BY or a WHERE clause is added to the unnesting that changes the original indexing.

This was my use case, and if you are after relative positioning in an filtered array, you should use a subquery to create the filtered array first, and then the combination UNNEST WITH OFFSET + OFFSET/ORDINAL in the outer query.

BigQuery: LAG() on UNNEST()

Answers (2)

Related Questions