Reputation: 88
In google-bigquery, I need to pull the string that is between domain** and ** as in the example bellow The string is under the column "Site_Data"
Can someone help me? 10x!
Upvotes: 1
Views: 8847
Reputation: 33745
Do all of the strings have that format? There are a couple of different options, assuming that you always need the third string after the **
delimiter.
1) Use SPLIT
, e.g.:
#standardSQL
WITH SampleData AS (
SELECT '756-1__67648582789116,tagtype**unmapped,domain**www.sport.com,userarriveddirectly**False' AS site_data
)
SELECT SPLIT(site_data, '**')[OFFSET(2)] AS visit_type
FROM SampleData;
2) Use REGEXP_EXTRACT
, e.g.:
#standardSQL
WITH SampleData AS (
SELECT '756-1__67648582789116,tagtype**unmapped,domain**www.sport.com,userarriveddirectly**False' AS site_data
)
SELECT REGEXP_EXTRACT(site_data, r'[^\*]+\*\*[^\*]+\*\*([^\*]+)') AS visit_type
FROM SampleData;
Taking this a step further, if you want to split the domain and the arrival type, you can use SPLIT
again:
#standardSQL
WITH SampleData AS (
SELECT '756-1__67648582789116,tagtype**unmapped,domain**www.sport.com,userarriveddirectly**False' AS site_data
)
SELECT
SPLIT(visit_type)[OFFSET(0)] AS domain,
SPLIT(visit_type)[OFFSET(1)] AS arrival_type
FROM (
SELECT SPLIT(site_data, '**')[OFFSET(2)] AS visit_type
FROM SampleData
);
Upvotes: 3
Reputation: 172993
See example below
#standardSQL
WITH yourTable AS (
SELECT '756-1__6565656565656, tagtype**unmapped,domain**www.sport.com,userarriveddirectly**False' AS Site_Data
)
SELECT
REGEXP_EXTRACT(Site_Data, r'domain\*\*(.*)\*\*') AS x,
Site_Data
FROM yourTable
Upvotes: 7