Daniel Honigsblum
Daniel Honigsblum

Reputation: 88

Extract a sub string from a string

In google-bigquery, I need to pull the string that is between domain** and ** as in the example bellow The string is under the column "Site_Data"

Can someone help me? 10x!

enter image description here

Upvotes: 1

Views: 8847

Answers (2)

Elliott Brossard
Elliott Brossard

Reputation: 33745

Do all of the strings have that format? There are a couple of different options, assuming that you always need the third string after the ** delimiter.

1) Use SPLIT, e.g.:

#standardSQL
WITH SampleData AS (
  SELECT '756-1__67648582789116,tagtype**unmapped,domain**www.sport.com,userarriveddirectly**False' AS site_data
)
SELECT SPLIT(site_data, '**')[OFFSET(2)] AS visit_type
FROM SampleData;

2) Use REGEXP_EXTRACT, e.g.:

#standardSQL
WITH SampleData AS (
  SELECT '756-1__67648582789116,tagtype**unmapped,domain**www.sport.com,userarriveddirectly**False' AS site_data
)
SELECT REGEXP_EXTRACT(site_data, r'[^\*]+\*\*[^\*]+\*\*([^\*]+)') AS visit_type
FROM SampleData;

Taking this a step further, if you want to split the domain and the arrival type, you can use SPLIT again:

#standardSQL
WITH SampleData AS (
  SELECT '756-1__67648582789116,tagtype**unmapped,domain**www.sport.com,userarriveddirectly**False' AS site_data
)
SELECT
  SPLIT(visit_type)[OFFSET(0)] AS domain,
  SPLIT(visit_type)[OFFSET(1)] AS arrival_type
FROM (
  SELECT SPLIT(site_data, '**')[OFFSET(2)] AS visit_type
  FROM SampleData
);

Upvotes: 3

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

See example below

#standardSQL
WITH yourTable AS (
  SELECT '756-1__6565656565656, tagtype**unmapped,domain**www.sport.com,userarriveddirectly**False' AS Site_Data
)
SELECT 
  REGEXP_EXTRACT(Site_Data, r'domain\*\*(.*)\*\*') AS x,
  Site_Data
FROM yourTable

Upvotes: 7

Related Questions