Algunenano
Algunenano

Reputation: 134

Hitting Bigquery JS UDF limits with multiple rows

I'm having issues with BigQuery JS UDF limits. The only memory limit documented [1] is a approximately 5 MB or less limit to the amount of data output by the UDF so I've set up an internal check to return an error when the output buffer is over ~3MB (before base64 transformation); but even with that, I'm hitting limits when running a UDF against a series of rows, although individually they work fine.

For example, this fails with Resources exceeded during query execution: UDF out of memory.

WITH joined_data AS
(
  SELECT z, x, y, ARRAY_AGG(data) as data
  FROM `rmr_tests.z6_pre_agg`
  WHERE x = 14
  GROUP BY z, x, y
)
SELECT z, x, y, rmr_tests.UDFNAME(data, 4096, '{}') as mvt from joined_data

Job id: cartodb-gcp-backend-data-team:US.bquxjob_1628ffa1_170c4d0d50f

On the other hand, if I trick Bigquery to batch the calls to the UDF differently I don't have any issues:

WITH joined_data AS
(
  SELECT z, x, y,
      ARRAY_AGG(data) as data,
      ABS(MOD(FARM_FINGERPRINT(CONCAT(CAST(x AS STRING), y, CAST(z AS STRING))), 5)) AS r
  FROM `rmr_tests.z6_pre_agg`
  WHERE x = 14
  GROUP BY z, x, y
)
SELECT z, x, y, rmr_tests.UDFNAME(data, 4096, '{}') as mvt FROM joined_data WHERE r = 0
UNION ALL 
SELECT z, x, y, rmr_tests.UDFNAME(data, 4096, '{}') as mvt FROM joined_data WHERE r = 1
UNION ALL 
SELECT z, x, y, rmr_tests.UDFNAME(data, 4096, '{}') as mvt FROM joined_data WHERE r = 2
UNION ALL 
SELECT z, x, y, rmr_tests.UDFNAME(data, 4096, '{}') as mvt FROM joined_data WHERE r = 3
UNION ALL 
SELECT z, x, y, rmr_tests.UDFNAME(data, 4096, '{}') as mvt FROM joined_data WHERE r = 4

Job-id: cartodb-gcp-backend-data-team:US.bquxjob_3f216117_170c4d2abae

I'm not sure if it is important, but the data passed to the UDF can be large (~20 MB in this example) and the output is of type BYTE (biggest is 2.71 MB as reported by Length(mvt)).

There was a similar issue in the past [2] which was referenced a possible issue with v8 but I don't have a way of knowing if it's the same issue.

Is this an issue in my UDF or an issue with v8+Bigquery? I've tested an equivalent code under node and memory remains stable (no noticeable leaks after running for 20-30 minutes).

[1] - https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions

[2] - https://stackoverflow.com/a/35563562

Edit: Reported it in the bug tracker: https://issuetracker.google.com/u/1/issues/151212192

Upvotes: 2

Views: 930

Answers (1)

Rafael L
Rafael L

Reputation: 11

When you say "I'm hitting limits when running a UDF against a series of rows, although individually they work fine". I think that is the exact reason you are hitting limits since the "UDF limits documentation" refers to a single row as follows:

"The amount of data that your JavaScript UDF outputs when processing a single row — approximately 5 MB or less."

Upvotes: 0

Related Questions