Reputation: 163
I have notice that running the same algorithm can take much longer (up to x60) when running in a Bigquery UDF than when running directly.
The two code snippets below that illustrate the problem:
The first one complete in 1.6 sec
WITH T_ AS (
SELECT road_geom g
FROM `bigquery-public-data.geo_us_roads.all_roads_04`
ORDER BY ST_LENGTH(road_geom) desc
LIMIT 1
),
T0 AS (
SELECT ST_PointN(g,idx) point, idx
FROM T_, UNNEST(GENERATE_ARRAY(1, ST_NPOINTS(g))) AS idx
)
SELECT point
FROM T0
ORDER BY ST_LENGTH(ST_MAKELINE(ARRAY_AGG(point) OVER ( ORDER BY idx ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING))) DESC
LIMIT 1
The second finished in 1 min 47 sec
CREATE TEMP FUNCTION
myfunction(g GEOGRAPHY) AS ((
WITH
T0 AS (
SELECT ST_PointN(g,idx) point, idx
FROM UNNEST(GENERATE_ARRAY(1, ST_NPOINTS(g))) AS idx
)
SELECT point
FROM T0
ORDER BY ST_LENGTH(ST_MAKELINE(ARRAY_AGG(point) OVER ( ORDER BY idx ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING))) DESC
LIMIT 1));
SELECT myfunction(road_geom)
FROM `bigquery-public-data.geo_us_roads.all_roads_04`
ORDER BY ST_LENGTH(road_geom) DESC
LIMIT 1
Upvotes: 0
Views: 151
Reputation: 7744
These seems to be somewhat different queries.
First one computes a single row in T_
by taking the longest road, then uses it in T0
computation to extract a list of points, and then computes longest segment.
Second one computes this for every road, and only then takes a single result by taking the longest road. I guess the optimizer is not quite smart enough to rewrite the second query.
I suggest writing the last query as following, which makes it equally fast:
SELECT myfunction(road_geom)
FROM (
SELECT *
FROM `bigquery-public-data.geo_us_roads.all_roads_04`
ORDER BY ST_LENGTH(road_geom) DESC
LIMIT 1
)
Upvotes: 2