Reputation: 179
I have a GeoJSON string for a multipoint geometry. I want to extract each of those points to a table of individual point geometries in BigQuery
I have been able to achieve point geometry for one of the points. I want to do it for all the others as well in a automated fashion. I've already tried converting the string to an array but it remains an array of size 1 with the entire content as a single string.
This is what worked for me that I was able to extract one point and convert it to a geometry
WITH temp_table as (select '{ "type": "MultiPoint", "coordinates": [ [ 20, 10 ], [ 30, 5 ], [ 90, 50 ], [ 40, 80 ] ] }' as string)
select ST_GEOGPOINT(CAST(JSON_EXTRACT(string, '$.coordinates[0][0]') as FLOAT64), CAST(JSON_EXTRACT(string, '$.coordinates[0][1]') as FLOAT64)) from temp_table
This results in POINT(20 10)
I can write manual queries for each of these points and do a UNION ALL
but that won't scale or work every time. I want to achieve this such that it is able to do it in a automated fashion. For architectural purposes, we can't do string manipulation in languages like Python.
Upvotes: 1
Views: 754
Reputation: 173200
Below is for BigQuery Standard SQL
#standardSQL
SELECT
ARRAY(
SELECT ST_GEOGPOINT(
CAST(SPLIT(pair)[OFFSET(0)] AS FLOAT64), CAST(SPLIT(pair)[SAFE_OFFSET(1)] AS FLOAT64))
FROM UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(STRING, '$.coordinates'), r'\[(\d+,\d+)\]')) pair
) points
FROM `project.dataset.temp_table`
You can test, play with above using sample data from your question as in below example
#standardSQL
WITH `project.dataset.temp_table` AS (
SELECT '{ "type": "MultiPoint", "coordinates": [ [ 20, 10 ], [ 30, 5 ], [ 90, 50 ], [ 40, 80 ] ] }' AS STRING
)
SELECT
ARRAY(
SELECT ST_GEOGPOINT(
CAST(SPLIT(pair)[OFFSET(0)] AS FLOAT64), CAST(SPLIT(pair)[SAFE_OFFSET(1)] AS FLOAT64))
FROM UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(STRING, '$.coordinates'), r'\[(\d+,\d+)\]')) pair
) points
FROM `project.dataset.temp_table`
with result
Row points
1 POINT(20 10)
POINT(30 5)
POINT(90 50)
POINT(40 80)
Note: in above version - array of points is produced for each respective original row. Obviously you can adjust it to flatten as in below example
#standardSQL
WITH `project.dataset.temp_table` AS (
SELECT '{ "type": "MultiPoint", "coordinates": [ [ 20, 10 ], [ 30, 5 ], [ 90, 50 ], [ 40, 80 ] ] }' AS STRING
)
SELECT
ST_GEOGPOINT(
CAST(SPLIT(pair)[OFFSET(0)] AS FLOAT64), CAST(SPLIT(pair)[SAFE_OFFSET(1)] AS FLOAT64)
) points
FROM `project.dataset.temp_table`, UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(STRING, '$.coordinates'), r'\[(\d+,\d+)\]')) pair
with result
Row points
1 POINT(20 10)
2 POINT(30 5)
3 POINT(90 50)
4 POINT(40 80)
Upvotes: 4