Pierre-Alexandre
Pierre-Alexandre

Reputation: 775

Split a string and create a new column with the result of that split using BigQuery

I have a BigQuery students table that contains informations regarding students. The structure of the table is the following:

id - SRING
gpa - STRING
classesA - STRING
classesB - STRING

sample rows

id - "STU03827"
gpa - "3.80"
classesA - "history, math, physics"
classesB - "french, chinese" 
-----------
id - "STU29025"
gpa - "3.21"
classesA - ""
classesB - "spanish, italian" 

expected output

id - "STU03827"
gpa - "3.80"
classesA - "history, math, physics"
classesB - "french, chinese" 
classID - "history"
-----------
id - "STU03827"
gpa - "3.80"
classesA - "history, math, physics"
classesB - "french, chinese" 
classID - "math"
-----------
id - "STU03827"
gpa - "3.80"
classesA - "history, math, physics"
classesB - "french, chinese" 
classID - "physics"
-----------
id - "STU03827"
gpa - "3.80"
classesA - "history, math, physics"
classesB - "french, chinese" 
classID - "french"
-----------
id - "STU03827"
gpa - "3.80"
classesA - "history, math, physics"
classesB - "french, chinese" 
classID - "chinese"
-----------
id - "STU29025"
gpa - "3.21"
classesA - ""
classesB - "spanish, italian" 
classID - "spanish"
-----------
id - "STU29025"
gpa - "3.21"
classesA - ""
classesB - "spanish, italian" 
classID - "italian"

What I am trying to achieve is using SELECT and WITH AS, selecting the students table and add a new column which is classId. classId correspond to the current iteration. For instance, if a given record has a total of 5 items between classesA and classesB, I should see that record 5 types with only a single difference, the classId.

What I have so far

SELECT id as id,
classesA as classesA ,
classesB as classesB ,
SPLIT(classesA , ",") as classes
FROM students
WHERE classesA <> ""
UNION ALL
SELECT id as id,
classesA as classesA ,
classesB as classesB ,
SPLIT(classesB, ",") as classes
FROM students
WHERE classesB <> ""

Upvotes: 0

Views: 52

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173190

alternative option

select * from students, 
unnest(split(classesA, ', ') || split(classesB, ', ')) as classID
where classID != ''              

with output

enter image description here

Upvotes: 1

Daniel Zagales
Daniel Zagales

Reputation: 3032

I think this is what you are trying to achieve:

with sample as (
    select "STU03827" as id, "3.80" as gpa, "history, math, physics" as classesA, "french, chinese" as classesB
    union all 
    select "STU29025", "3.21", "", "spanish, italian"
)

select sample.*, classId 
from sample, UNNEST(split(concat(classesA,", ", classesB ))) as classId
where classId <> "";

Upvotes: 1

Related Questions