Reputation: 1544
I was looking at using LAG()
to return a value from the previous row in an ARRAY of STRUCT
, and it seems like the navigation window functions like LAG/LEAD
are not supported with UNNEST
.
It sounds suboptimal but you need to UNNEST
all and (re)PARTITION BY
!!
E.g. say you want the amount of the previous line:
EDIT: Schema details:
Invoices schema :
docID STRING, lines STRUCT<lineID STRING, amount NUMERIC>
Expected output:
docID STRING, lines STRUCT<lineID STRING, amount NUMERIC, prev_amount NUMERIC>
SELECT
docID, lineID, amount
,LAG(amount) OVER (PARTITION BY docID ORDER by lineID) prev_amount
FROM invoices, UNNEST(lines)
This is pretty slow. I wish we could somehow pass the ARRAY as window frame to use.
Any alternative ? I am using BigQuery
but this may apply to other DBMS.
UPDATE: I have come up with below alternative using
WITH OFFSET
. Faster and maybe this is what OFFSET is meant to be used for, it's just ugly and could be simpler in my mind.
SELECT
docID, lineID, amount
,(SELECT amount FROM UNNEST(lines) WITH OFFSET AS pos2 WHERE pos2 = pos - 1) prev_amount
FROM invoices, UNNEST(lines) WITH OFFSET AS pos
Upvotes: 1
Views: 2035
Reputation: 1544
Below alternative to Mikhail's answer, using WITH OFFSET
instead, performs better (55s vs. 170s, over 50M rows)
#standardSQL
SELECT docID, ARRAY(
SELECT AS STRUCT
l.*,
lines[SAFE_OFFSET(pos - 1)].amount prev_amount
FROM UNNEST(lines) l WITH OFFSET pos
) lines
FROM invoices
Note: this will not work as-is if ORDER BY
or a WHERE
clause is added to the unnesting that changes the original indexing.
This was my use case, and if you are after relative positioning in an filtered array, you should use a subquery to create the filtered array first, and then the combination UNNEST WITH OFFSET
+ OFFSET/ORDINAL
in the outer query.
Upvotes: 1
Reputation: 172944
Below shows better performance (at least on small amount that I run it against for quick testing). Also execution plan looks much better (to me at least)
#standardSQL
SELECT
lines[OFFSET(pos)].*, lines[SAFE_OFFSET(pos - 1)].amount prev_amount
FROM invoices, UNNEST(GENERATE_ARRAY(0, ARRAY_LENGTH(lines) - 1)) pos
Note: I was assuming following schema [STRUCT<docID INT64, lineID INT64, amount FLOAT64>] lines
UPDATE: Adjustment to match to real schema you just provided
#standardSQL
SELECT docID, ARRAY(
SELECT AS STRUCT
lines[OFFSET(pos)].*,
lines[SAFE_OFFSET(pos - 1)].amount prev_amount
FROM UNNEST(GENERATE_ARRAY(0, ARRAY_LENGTH(lines) - 1)) pos
) lines
FROM invoices
Upvotes: 1