Dan
Dan

Reputation: 35

How do I split a string column into multi rows of single words & word pairs in BigQuery SQL?

I am trying (unsuccessfully) to split a string column in Google BigQuery into rows containing all single words and all word pairs (next to each other & in order). I also need to maintain the ID field for the words from the IndataTable. Both recordsets have 2 columns.

IndataTable as IDT
ID WordString
1 apple banana pear
2 carrot
3 blue red green yellow

OutdataTable as ODT
ID WordString
1 apple
1 banana
1 pear
1 apple banana
1 banana pear
2 carrot
3 blue
3 red
3 green
3 yellow
3 blue red
3 red green
3 green yellow (only pairs that are next to each other)

Is this possible in BigQuery SQL?

Edit/Added:
This is what I have so far which works for splitting it up into single words. I am really struggling to figure out how to expand this to word pairs. I don't know if this can be modified for it or I need a new approach altogether.

SELECT ID, split(WordString,' ') as Words
FROM (
  select * 
     from 
     (select ID, WordString from IndataTable)
)

Upvotes: 2

Views: 3694

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172993

Below is for BigQuery Standard SQL

#standardSQL
WITH IndataTable AS (
  SELECT 1 id, 'apple banana pear' WordString UNION ALL
  SELECT 2, 'carrot' UNION ALL
  SELECT 3, 'blue red green yellow' 
), words AS (
  SELECT id, word, pos
  FROM IndataTable, UNNEST(SPLIT(WordString,' ')) AS Word WITH OFFSET pos
), pairs AS (
  SELECT id, CONCAT(word, ' ', LEAD(word) OVER(PARTITION BY id ORDER BY pos)) pair
  FROM words
)
SELECT id, word AS WordString FROM words UNION ALL
SELECT id, pair AS WordString FROM pairs
WHERE NOT pair IS NULL
ORDER BY id  

with result as expected :

Row id  WordString   
1   1   apple    
2   1   banana   
3   1   pear     
4   1   apple banana     
5   1   banana pear  
6   2   carrot   
7   3   blue     
8   3   red  
9   3   green    
10  3   yellow   
11  3   blue red     
12  3   red green    
13  3   green yellow     

Upvotes: 5

Related Questions