Nivi
Nivi

Reputation: 1137

Split a column based on a character in BigQuery

I have a table as shown below on BigQuery

Name | Score
Tim  | 63 > 89 > 90
James| 67 > 44

I want to split the Score column into N separate columns where N is the maximum score length in the entire table. I would like the table to be as follow.

Name| Score_1 | Score_2 | Score_3
Tim | 63      | 89      | 90
James| 67     | 44      | 0 or NA

I tried the Split command but I end up doing a new row for each Name-Score combination.

Upvotes: 0

Views: 599

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172974

For BigQuery Standard SQL

Below is simple case and assumes you know in advance the expected max score length (3 in below example)

#standardSQL
WITH `project.dataset.your_table` AS (
  SELECT 'Tim' name, '63 > 89 > 90' score UNION ALL
  SELECT 'James', '67 > 44' 
)
SELECT 
  name, 
  score[SAFE_OFFSET(0)] AS score_1,
  score[SAFE_OFFSET(1)] AS score_2,
  score[SAFE_OFFSET(2)] AS score_3
FROM (
  SELECT name, SPLIT(score, ' > ') score
  FROM `project.dataset.your_table`
)  

with result

Row name    score_1 score_2 score_3  
1   Tim     63      89      90       
2   James   67      44      null         

Of course above approach means - if you have many scores - like 10 or 20 or more - you will need to add respective number of extra lines like below

  score[SAFE_OFFSET(20)] AS score_21

So, above gives you what you wanted from schema of output point of view

At the same time, below makes more sense to me and in most practical cases is better and most optimal :

#standardSQL
WITH `project.dataset.your_table` AS (
  SELECT 'Tim' name, '63 > 89 > 90' score UNION ALL
  SELECT 'James', '67 > 44' 
)
SELECT name, score
FROM `project.dataset.your_table`, UNNEST(SPLIT(score, ' > ')) score

with result

Row name    score    
1   Tim     63   
2   Tim     89   
3   Tim     90   
4   James   67   
5   James   44   

Upvotes: 2

Related Questions