Is there a way to remove characters in an array of string in BigQuery?

Question

Using the GDELT public database in Google query, I am trying to find the top themes associated with Israeli Prime Minister Benjamin Netanyahu around March 3, 2015.

I used the following SQL query

SELECT theme, COUNT(*) as count
FROM (
select REGEXP_REPLACE(SPLIT(V2Themes,';'), r',.*', '') theme
from `gdelt-bq.gdeltv2.gkg`
where DATE>20150302000000 and DATE < 20150304000000 and V2Persons like '%Netanyahu%'
)
group by theme
ORDER BY 2 DESC
LIMIT 300

The split is necessary because the V2Themes column uses nested listings. I then want to remove the character offset. This should give me the following:

Row theme               count    
1   GENERAL_GOVERNMENT  33677    
2   LEADER              33405    
3   TAX_FNCACT_MINISTER 31174
4   ....                 ...

But I get an error instead:

No matching signature for function REGEXP_REPLACE for argument types: ARRAY, STRING, STRING.

I understand that SPLIT() creates an array of strings, but I do not know how to resolve this issue.

Is there another function I should use, or can this problem be resolved in another way?

*EDIT

The query works when it is run with Legacy SQL (also need to change the quotation marks to squared brackets). How can I achieve the same using standard SQL?

Mikhail Berlyant · Accepted Answer

Below is for BigQuery Standard SQL

#standardSQL
SELECT theme, COUNT(*) AS COUNT
FROM (
  SELECT REGEXP_REPLACE(value, r',.*', '') theme
  FROM `gdelt-bq.gdeltv2.gkg`, UNNEST(SPLIT(V2Themes,';')) value
  WHERE DATE>20150302000000 AND DATE < 20150304000000 
  AND V2Persons LIKE '%Netanyahu%'
)
GROUP BY theme
ORDER BY 2 DESC
LIMIT 300

Be aware of following: when you run query and it fails - the cost for you is zero (in most common cases).

This is not a case when your query is correct (as above one in this answer) - in such cases query will successfully complete and you will be billed as per

in Classic UI:

or in BigQuery Console:

Is there a way to remove characters in an array of string in BigQuery?

Answers (1)

Related Questions