gadhgadhi
gadhgadhi

Reputation: 123

Split string into multiple columns with bigquery

I have a table in BigQuery with millions of rows, and I want to split adx_catg_id column to multiple new columns. Please note that the adx_catg_id column contains an arbitrary number of words separated by space.

This example of Query below can split the adx_catg_id to multiple columns if the string contains only less than five words. I can extend it to support more number of words, but I need to automate it.

SELECT
  TS, str0, str2, str4, str6, str7
  from
  (select REGEXP_EXTRACT(str5, r'^(.*) .*') as str7
  from
  (select SUBSTR (str5, LENGTH(REGEXP_EXTRACT(str5, r'^(.*) .*')) + 2, LENGTH(str5)) as str6
  from
  (select REGEXP_EXTRACT(str3, r'^(.*) .*') as str5
  from
  (select SUBSTR (str3, LENGTH(REGEXP_EXTRACT(str3, r'^(.*) .*')) + 2, LENGTH(str3)) as str4
  from
  (select REGEXP_EXTRACT(str1, r'^(.*) .*') as str3
  from
  (select SUBSTR (str1, LENGTH(REGEXP_EXTRACT(str1, r'^(.*) .*')) + 2, LENGTH(str1)) as str2
  from
  (select REGEXP_EXTRACT(TS, r'^(.*) .*') as str1
  from
  (select SUBSTR(TS, LENGTH(REGEXP_EXTRACT(TS, r'^(.*) .*')) + 2,LENGTH(TS)) as str0
  from 
  (select adx_catg_id TS from [mydataset.conversions])
  ))))))))

How can I loop the above query to generate all words in new columns depending of string length?

Upvotes: 3

Views: 17931

Answers (2)

N.N.
N.N.

Reputation: 3172

Check this out...

SELECT  
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){0}([^\s]*)\s?') as Word0,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){1}([^\s]*)\s?') as Word1,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){2}([^\s]*)\s?') as Word2,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){3}([^\s]*)\s?') as Word3,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){4}([^\s]*)\s?') as Word4,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){5}([^\s]*)\s?') as Word5,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){6}([^\s]*)\s?') as Word6, 
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){7}([^\s]*)\s?') as Word7,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){8}([^\s]*)\s?') as Word8,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){9}([^\s]*)\s?') as Word9,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){10}([^\s]*)\s?') as Word10,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){11}([^\s]*)\s?') as Word11,
Regexp_extract(StringToParse,r'^(?:[^\s]*\s){12}([^\s]*)\s?') as Word12,
FROM
(SELECT 'arbitrary number of words separated by space.' as StringToParse)

Or if you want it in reverse order:

SELECT  
Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){1}$') as Word1,
Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){2}$') as Word2,
Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){3}$') as Word3,
Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){4}$') as Word4,
Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){5}$') as Word5,
Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){6}$') as Word6, 
Regexp_extract(StringToParse,r'\s?([^\s]*)(?:[^\s]*\s?){7}$') as Word7,
FROM
(SELECT 'arbitrary number of words separated by space.' as StringToParse)

Its still a fixed number of fields, but coding is simpler and more readable.

Hope this helps

Upvotes: 5

Felipe Hoffa
Felipe Hoffa

Reputation: 59175

Unfortunately there is no easy SPLIT() today in BigQuery - but it's a good feature request.

I like the answer you developed, I'll experiment more with it. For an alternative approach you can also try https://stackoverflow.com/a/18711812/132438.

The best way to automate this in the meantime, could be automating the query generation outside BigQuery.

Upvotes: 0

Related Questions