Reputation: 179
I am trying to run the following query on Google Big Query.
It shows as valid in the console, when running it, I get this error message:
"Array cannot have a null element; error in writing field result"
How do resolve this?
SELECT
ARRAY(
SELECT regexp_extract(x, r'^(.*?)\:')
FROM UNNEST(split(V2Themes,',')) AS x
) AS result
FROM `gdelt-bq.gdeltv2.gkg_partitioned`
WHERE _PARTITIONTIME BETWEEN TIMESTAMP('2019-02-02') AND TIMESTAMP('2019-02-02')
Upvotes: 3
Views: 7519
Reputation: 692
maybe you want more than just first character before \:
? here is some tests
select
x,
regexp_extract(x, r'^(.*?)\\:') as only_first_occurrence_nongreedy,
regexp_extract(x, r'^(.*)\\:') as only_first_occurrence_greedy,
regexp_extract_all(x, r'(.*?)\\:') as all_occurrences_nongreedy,
regexp_extract_all(x, r'(.*?)\\:') as all_occurrences_greedy -- same as regexp_extract, but returns array
from
unnest(
array[
struct(r'abc\:' as x),
struct(r'123\:abc\:'),
struct(r'\:'),
struct(r'\\')
]
)
Upvotes: 0
Reputation: 172993
Below is for BigQuery Standard SQL
#standardSQL
SELECT * FROM (
SELECT
ARRAY(
SELECT REGEXP_EXTRACT(x, r'^(.*?)\:')
FROM UNNEST(SPLIT(V2Themes,',')) AS x
WHERE REGEXP_CONTAINS(x, r'^(.*?)\:') -- << removes nulls from array
) AS result
FROM `gdelt-bq.gdeltv2.gkg_partitioned`
WHERE _PARTITIONTIME BETWEEN TIMESTAMP('2019-02-02') AND TIMESTAMP('2019-02-02')
)
WHERE ARRAY_LENGTH(result) > 0 -- << filters out output with empty array
Upvotes: 1
Reputation: 1269803
Remove the NULL
value:
ARRAY(
SELECT regexp_extract(x, r'^(.*?)\:')
FROM UNNEST(split(V2Themes,',')) AS x
WHERE regexp_extract(x, r'^(.*?)\:') IS NOT NULL AND
)
Upvotes: 2