Reputation: 417
I have a quite long table that is 2.47GB big and has 112.6 million rows. The table has 8 columns, being the first one of TIMESTAMP
type and the other 7 FLOAT
type. The table is partiotioned (day) and clustered by datetime
.
The table looks like this:
datetime col1 col2 col3 col4 col5 col6 col7
2020-07-29 07:19:24.920 UTC null null null null null null 0.01
2020-07-29 16:41:53.094 UTC 1.3344 null null 0.6975 null null 0.01
2020-07-29 08:05:23.705 UTC null 1.698 null null null null 0.01
2020-07-29 18:12:21.396 UTC 1.33435 null null null null null 0.01
2020-07-29 19:49:26.073 UTC null 1.12345 null null 1.33435 null 0.01
2020-07-29 19:33:21.540 UTC null null null null 1.33377 null 0.01
2020-07-29 04:11:24.596 UTC 1.256 null null null 1.33694 null 0.01
2020-07-29 09:27:05.052 UTC null null null 0.6868 null 104.889 0.01
What I'm trying to do is to populate other table with the same values of this table but performing a fowrward fill of null
values, which I was reading in docs that can be achieved using LAST_VALUE
function.
The query I'm trying to perform uses LAST_VALUE
with OVER
for all columns except for the last one.
INSERT project.dataset.table (datetime, col1,col2,col3,col4,col5,col6)
WITH current AS(
SELECT
datetime,
LAST_VALUE(col1 IGNORE NULLS) OVER (ORDER BY datetime) AS col1,
LAST_VALUE(col2 IGNORE NULLS) OVER (ORDER BY datetime) AS col2,
LAST_VALUE(col3 IGNORE NULLS) OVER (ORDER BY datetime) AS col3,
LAST_VALUE(col4 IGNORE NULLS) OVER (ORDER BY datetime) AS col4,
LAST_VALUE(col5 IGNORE NULLS) OVER (ORDER BY datetime) AS col5,
LAST_VALUE(col6 IGNORE NULLS) OVER (ORDER BY datetime) AS col6,
FROM
project.dataset.origin_table
ORDER BY
datetime)
select * from current
However, running this query returns this error:
Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 100% of limit. Top memory consumer(s): sort operations used for analytic OVER() clauses: 97% other/unattributed: 3%
So, for reading the error message, its clear that OVER
is the issue, unless I'm missing something or I'm not understanding something.
How can I achieve this? The desired output would be like this:
datetime col1 col2 col3 col4 col5 col6 col7
2020-07-29 07:19:24.920 UTC null null null null null null 0.01
2020-07-29 16:41:53.094 UTC 1.3344 null null 0.6975 null null 0.01
2020-07-29 08:05:23.705 UTC 1.3344 1.698 null 0.6975 null null 0.01
2020-07-29 18:12:21.396 UTC 1.33435 1.698 null 0.6975 null null 0.01
2020-07-29 19:49:26.073 UTC 1.33435 1.12345 null 0.6975 1.33435 null 0.01
2020-07-29 19:33:21.540 UTC 1.33435 1.12345 null 0.6975 1.33377 null 0.01
2020-07-29 04:11:24.596 UTC 1.256 1.12345 null 0.6975 1.33694 null 0.01
2020-07-29 09:27:05.052 UTC 1.256 1.12345 null 0.6868 1.33694 104.889 0.01
To fill null values with the last value encountered in each column.
Thank you!
Upvotes: 2
Views: 725
Reputation: 77
The error message you're seeing refers to the sorting part of your OVER clause, i.e., "ORDER BY".
When using analytic functions, all the data in each of the final partitions need to be handled by the same worker. When the amount of data to be handled, and in this case, sorted, is really high, sometimes it can be allocated in memory and fails. At the moment there's no option to increase the memory on these situations, only workarounds.
Try to work with only one column in a first stage, and then paste all the columns in a single table.
Upvotes: 0
Reputation: 1270713
The problem isn't specifically LAST_VALUE()
. The problem is the lack of PARTITION BY
in the window specification. You can validate this with a little trick, which would be to introduce the date as a partitioning element.
So, does this work?
LAST_VALUE (col1 IGNORE NULLS) OVER (PARTITION BY DATETIME_TRUNC(datetime, day) ORDER BY datetime) AS col1,
I am not saying that this will return the results you want, but it should fix the resources exceeded problem.
Then . . . If might be that you have a natural PARTITION BY
that should be used. If so, use it. If not, there may be other ways to get the results you want.
Upvotes: 0