Pranay Nanda
Pranay Nanda

Reputation: 179

How to extract separate values from GeoJSON in BigQuery

I have a GeoJSON string for a multipoint geometry. I want to extract each of those points to a table of individual point geometries in BigQuery

I have been able to achieve point geometry for one of the points. I want to do it for all the others as well in a automated fashion. I've already tried converting the string to an array but it remains an array of size 1 with the entire content as a single string.

This is what worked for me that I was able to extract one point and convert it to a geometry

WITH temp_table as (select '{ "type": "MultiPoint", "coordinates": [ [ 20, 10 ], [ 30, 5 ], [ 90, 50 ], [ 40, 80 ] ]  }' as string) 

select ST_GEOGPOINT(CAST(JSON_EXTRACT(string, '$.coordinates[0][0]') as FLOAT64), CAST(JSON_EXTRACT(string, '$.coordinates[0][1]') as FLOAT64))  from temp_table

This results in POINT(20 10)

I can write manual queries for each of these points and do a UNION ALL but that won't scale or work every time. I want to achieve this such that it is able to do it in a automated fashion. For architectural purposes, we can't do string manipulation in languages like Python.

Upvotes: 1

Views: 754

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173200

Below is for BigQuery Standard SQL

#standardSQL
SELECT 
  ARRAY(
    SELECT ST_GEOGPOINT(
      CAST(SPLIT(pair)[OFFSET(0)] AS FLOAT64), CAST(SPLIT(pair)[SAFE_OFFSET(1)] AS FLOAT64)) 
    FROM UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(STRING, '$.coordinates'), r'\[(\d+,\d+)\]')) pair
  ) points
FROM `project.dataset.temp_table`  

You can test, play with above using sample data from your question as in below example

#standardSQL
WITH `project.dataset.temp_table` AS (
  SELECT '{ "type": "MultiPoint", "coordinates": [ [ 20, 10 ], [ 30, 5 ], [ 90, 50 ], [ 40, 80 ] ]  }' AS STRING
) 
SELECT 
  ARRAY(
    SELECT ST_GEOGPOINT(
      CAST(SPLIT(pair)[OFFSET(0)] AS FLOAT64), CAST(SPLIT(pair)[SAFE_OFFSET(1)] AS FLOAT64)) 
    FROM UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(STRING, '$.coordinates'), r'\[(\d+,\d+)\]')) pair
  ) points
FROM `project.dataset.temp_table`   

with result

Row points   
1   POINT(20 10)     
    POINT(30 5)  
    POINT(90 50)     
    POINT(40 80)     

Note: in above version - array of points is produced for each respective original row. Obviously you can adjust it to flatten as in below example

#standardSQL
WITH `project.dataset.temp_table` AS (
  SELECT '{ "type": "MultiPoint", "coordinates": [ [ 20, 10 ], [ 30, 5 ], [ 90, 50 ], [ 40, 80 ] ]  }' AS STRING
) 
SELECT 
  ST_GEOGPOINT(
      CAST(SPLIT(pair)[OFFSET(0)] AS FLOAT64), CAST(SPLIT(pair)[SAFE_OFFSET(1)] AS FLOAT64)
  ) points
FROM `project.dataset.temp_table`, UNNEST(REGEXP_EXTRACT_ALL(JSON_EXTRACT(STRING, '$.coordinates'), r'\[(\d+,\d+)\]')) pair   

with result

Row points   
1   POINT(20 10)     
2   POINT(30 5)  
3   POINT(90 50)     
4   POINT(40 80)     

Upvotes: 4

Related Questions