KeepLearn
KeepLearn

Reputation: 308

BigQuery Regular Expression

I have a table with the following data. Trying to extract second filed if we split with "_" and it should contain [numbers-numbers|numbers-numbers]. Tried with regexp_extract but it was not able to get the desired result.

Please suggest how to achieve this.

Data:

                                             output 
D22_022-010|022-009_84233|669250    345     022-010 172.5
D22_022-010|022-009_666249|843250   22      022-009 172.5
D28I_28-04_5042|44182_250           235     022-010 11
D22_022-010|022-009_8423250         232     022-009 11
D23_23-06_NA_FW27_D23_600           22      28-04   235
D21_21-08_NA_FW14_D21_50            56      022-010 116
D23_23-06_NA_FW27_D23_90            88      022-009 116
D21_21-08_NA_FW14_D21_50            99      23-06   22
G | TR | Search : 56021             89      21-08   56
Free Sprayer_1x1(3.30)              77      23-06   88
Click Tracker (5.4)                 33      23-06   99
6.1 FW18_D28o_Click                 4       21-08   89
                                            null    77
                                            null    33
                                            null    4  

Table Data

Upvotes: 0

Views: 1286

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172974

Below is for BigQuery Standard SQL

Assuming your columns are ad and value below should do what you asked

#standardSQL
SELECT item, ROUND(IFNULL(value / ARRAY_LENGTH(items), value)) AS split_value
FROM (
  SELECT value, 
    SPLIT(REGEXP_EXTRACT(ad, '_((?:[0-9]+-[0-9]+)(?:\\|(?:[0-9]+-[0-9]+))*)'),'|') AS items
  FROM `yourProject.yourDataset.yourTable`
) LEFT JOIN UNNEST(items) AS item   

You can please test this with below dummy data from your question

#standardSQL
WITH `yourTable` AS (
  SELECT 'D22_022-010|022-009_84233|669250' AS ad, 345 AS value UNION ALL
  SELECT 'D22_022-010|022-009_666249|843250', 22 UNION ALL
  SELECT 'D28I_28-04_5042|44182_250', 235 UNION ALL
  SELECT 'D22_022-010|022-009_8423250', 232 UNION ALL
  SELECT 'D23_23-06_NA_FW27_D23_600', 22 UNION ALL 
  SELECT 'D21_21-08_NA_FW14_D21_50', 56 UNION ALL 
  SELECT 'D23_23-06_NA_FW27_D23_90', 88 UNION ALL 
  SELECT 'D21_21-08_NA_FW14_D21_50', 99 UNION ALL 
  SELECT 'G | TR | Search : 56021', 89 UNION ALL 
  SELECT 'Free Sprayer_1x1(3.30)', 77 UNION ALL 
  SELECT 'Click Tracker (5.4)', 33 UNION ALL 
  SELECT '6.1 FW18_D28o_Click', 4 
)
SELECT item, ROUND(IFNULL(value / ARRAY_LENGTH(items), value)) AS split_value
FROM (
  SELECT value, 
    SPLIT(REGEXP_EXTRACT(ad, '_((?:[0-9]+-[0-9]+)(?:\\|(?:[0-9]+-[0-9]+))*)'),'|') AS items
  FROM `yourTable`
) LEFT JOIN UNNEST(items) AS item   

The result is (as you would expect)

item    split_value  
------- -----------
022-010       173.0  
022-009       173.0  
022-010        11.0  
022-009        11.0  
28-04         235.0  
022-010       116.0  
022-009       116.0  
23-06          22.0  
21-08          56.0  
23-06          88.0  
21-08          99.0  
null           89.0  
null           77.0  
null           33.0  
null            4.0   

Upvotes: 1

Related Questions