Reputation: 477
I have a field in a table, with data like the following:
a) Health and Medicines
b) {'name': 'Health and Medicines', 'url': 'https://www.testurl.com/c/health-medicine'}
I need to clean this data, and let only values like "Health and Medicines" I'm trying to do this with REGEX_EXTRACT in BigQuery, but I still cannot make it work.
So far I have the following regex:
(?:{'name': ')(.*)(?:')(?:, 'url': )(?:.)*(?:})
But works only when data has the tags {'name':...
How can I clean this data?
Thanks!
Upvotes: 0
Views: 226
Reputation: 7277
You can use JSON_EXTRACT_SCALAR since you have JSON data. When the field does not contain JSON data, then use REGEX_EXTRACT. See approach below:
NOTE: Modify regex to capture your other use cases.
with sample_data as (
select 'Health and Medicines' as test_field
union all select "{'name': 'Health and Medicines', 'url': 'https://www.testurl.com/c/health-medicine'}" as test_field
)
select
test_field,
ifnull(JSON_EXTRACT_SCALAR(test_field,'$.name'), regexp_extract(test_field, r'\w+\s\w+\s\w+')) as extracted_value
from sample_data
See approach below:
Upvotes: 1