Cristian Avendaño
Cristian Avendaño

Reputation: 477

Regex to filter data in BigQuery

I have a field in a table, with data like the following:

a) Health and Medicines
b) {'name': 'Health and Medicines', 'url': 'https://www.testurl.com/c/health-medicine'}

I need to clean this data, and let only values like "Health and Medicines" I'm trying to do this with REGEX_EXTRACT in BigQuery, but I still cannot make it work.

So far I have the following regex:

(?:{'name': ')(.*)(?:')(?:, 'url': )(?:.)*(?:})

But works only when data has the tags {'name':...

How can I clean this data?

Thanks!

Upvotes: 0

Views: 226

Answers (1)

Ricco D
Ricco D

Reputation: 7277

You can use JSON_EXTRACT_SCALAR since you have JSON data. When the field does not contain JSON data, then use REGEX_EXTRACT. See approach below:

NOTE: Modify regex to capture your other use cases.

with sample_data as (
select 'Health and Medicines' as test_field
union all select "{'name': 'Health and Medicines', 'url': 'https://www.testurl.com/c/health-medicine'}" as test_field
)

select 
  test_field,
  ifnull(JSON_EXTRACT_SCALAR(test_field,'$.name'), regexp_extract(test_field, r'\w+\s\w+\s\w+')) as extracted_value
from sample_data

See approach below:

enter image description here

Upvotes: 1

Related Questions