Reputation: 134
I'm having issues with BigQuery JS UDF limits. The only memory limit documented [1] is a approximately 5 MB or less
limit to the amount of data output by the UDF so I've set up an internal check to return an error when the output buffer is over ~3MB (before base64 transformation); but even with that, I'm hitting limits when running a UDF against a series of rows, although individually they work fine.
For example, this fails with Resources exceeded during query execution: UDF out of memory.
WITH joined_data AS
(
SELECT z, x, y, ARRAY_AGG(data) as data
FROM `rmr_tests.z6_pre_agg`
WHERE x = 14
GROUP BY z, x, y
)
SELECT z, x, y, rmr_tests.UDFNAME(data, 4096, '{}') as mvt from joined_data
Job id: cartodb-gcp-backend-data-team:US.bquxjob_1628ffa1_170c4d0d50f
On the other hand, if I trick Bigquery to batch the calls to the UDF differently I don't have any issues:
WITH joined_data AS
(
SELECT z, x, y,
ARRAY_AGG(data) as data,
ABS(MOD(FARM_FINGERPRINT(CONCAT(CAST(x AS STRING), y, CAST(z AS STRING))), 5)) AS r
FROM `rmr_tests.z6_pre_agg`
WHERE x = 14
GROUP BY z, x, y
)
SELECT z, x, y, rmr_tests.UDFNAME(data, 4096, '{}') as mvt FROM joined_data WHERE r = 0
UNION ALL
SELECT z, x, y, rmr_tests.UDFNAME(data, 4096, '{}') as mvt FROM joined_data WHERE r = 1
UNION ALL
SELECT z, x, y, rmr_tests.UDFNAME(data, 4096, '{}') as mvt FROM joined_data WHERE r = 2
UNION ALL
SELECT z, x, y, rmr_tests.UDFNAME(data, 4096, '{}') as mvt FROM joined_data WHERE r = 3
UNION ALL
SELECT z, x, y, rmr_tests.UDFNAME(data, 4096, '{}') as mvt FROM joined_data WHERE r = 4
Job-id: cartodb-gcp-backend-data-team:US.bquxjob_3f216117_170c4d2abae
I'm not sure if it is important, but the data passed to the UDF can be large (~20 MB in this example) and the output is of type BYTE (biggest is 2.71 MB as reported by Length(mvt)).
There was a similar issue in the past [2] which was referenced a possible issue with v8 but I don't have a way of knowing if it's the same issue.
Is this an issue in my UDF or an issue with v8+Bigquery? I've tested an equivalent code under node and memory remains stable (no noticeable leaks after running for 20-30 minutes).
[1] - https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions
[2] - https://stackoverflow.com/a/35563562
Edit: Reported it in the bug tracker: https://issuetracker.google.com/u/1/issues/151212192
Upvotes: 2
Views: 930
Reputation: 11
When you say "I'm hitting limits when running a UDF against a series of rows, although individually they work fine". I think that is the exact reason you are hitting limits since the "UDF limits documentation" refers to a single row as follows:
"The amount of data that your JavaScript UDF outputs when processing a single row — approximately 5 MB or less."
Upvotes: 0