Son
Son

Reputation: 179

bigquery standard sql = extracting data from strings

I am looking to extract parts of a string that follow specified letters, for example from the below string:

wc:275,nwc:267,c1.3:2,c12.1:25,c12.10:39,c12.12:21,c12.13:4

I am looking to extract 275 (for wc), 2 (for c1.3) and 25 (for c12.1).

I have tried the following but the fields anew, ridanxietycnt and wordcount just show "NULL".

SELECT substr(CAST((DATE) AS STRING),0,8) as daydate,
count(1) as count,
avg(CAST(REGEXP_REPLACE(V2Tone, r',.*', "")AS FLOAT64)) tone,
avg(CAST(REGEXP_EXTRACT(GCAM, r'c1.3:([-d.]+)')AS FLOAT64)) anew,
sum(CAST(REGEXP_EXTRACT(GCAM, r'c12.1:([-d.]+)')AS FLOAT64)) ridanxietycnt, 
sum(CAST(REGEXP_EXTRACT(GCAM, r'wc:(d+)')AS FLOAT64)) wordcount   
FROM `gdelt-bq.gdeltv2.gkg_partitioned` where _PARTITIONTIME BETWEEN TIMESTAMP('2019-02-02') AND TIMESTAMP('2019-02-02') 
group by daydate

I would expect to see the aggregated number for each column.

I wonder if the issue is with the regex expression?

Upvotes: 0

Views: 299

Answers (2)

saifuddin778
saifuddin778

Reputation: 7277

You can treat them as simple key/value pairs and then do the aggregation on top of it. Something like:

select 
   substr(CAST((DATE) AS STRING),0,8) as daydate,
   split(x,':')[safe_offset(0)] as key, 
   cast(split(x,':')[safe_offset(1)] as float64) as value
from `gdelt-bq.gdeltv2.gkg_partitioned`, 
unnest(split(GCAM, ',')) as x
where _PARTITIONTIME BETWEEN TIMESTAMP('2019-02-02') AND TIMESTAMP('2019-02-02')

Hope this helps.

Upvotes: 1

thsr
thsr

Reputation: 75

This should get you 2 (anew), 25 (ridanxietycnt), 275 (wordcount), in this order

SELECT
SAFE_CAST(REGEXP_EXTRACT('wc:275,nwc:267,c1.3:2,c12.1:25,c12.10:39,c12.12:21,c12.13:4', r'c1.3:(\d+)') as FLOAT64) anew,
SAFE_CAST(REGEXP_EXTRACT('wc:275,nwc:267,c1.3:2,c12.1:25,c12.10:39,c12.12:21,c12.13:4', r'c12.1:(\d+)') as FLOAT64) ridanxietycnt, 
SAFE_CAST(REGEXP_EXTRACT('wc:275,nwc:267,c1.3:2,c12.1:25,c12.10:39,c12.12:21,c12.13:4', r'wc:(\d+)') as FLOAT64) wordcount  

Upvotes: 1

Related Questions