Saumil Agrawal
Saumil Agrawal

Reputation: 351

BigQuery: SPLIT() returns only one value

I have a page URL column components of which are delimited by /. I tried to run the SPLIT() function in BigQuery but it only gives the first value. I want all values in specific columns.

I don't understand how to use the Regexp_extract() example mentioned in Split string into multiple columns with bigquery.

I need something similar to REGEX_SPLIT_TO_TABLE(<String>, <DELIMITER>) which converts a single string into multiple columns.

Query:

SELECT PK, 
DATE(TIMESTAMP(CONCAT(SUBSTR(date,1,4),'-',SUBSTR(date,5,2),'-',SUBSTR(date,7,2),' 00:00:00'))) as visit_date,
hits_page_pagePath,
split(hits_page_pagePath,'/')
FROM [Intent.All2mon] limit 100

Upvotes: 33

Views: 208410

Answers (5)

Felipe Hoffa
Felipe Hoffa

Reputation: 59175

2018 standardSQL update:

#standardSQL
SELECT SPLIT(path, '/')[OFFSET(0)] part1,
       SPLIT(path, '/')[OFFSET(1)] part2,
       SPLIT(path, '/')[OFFSET(2)] part3
FROM (SELECT "/a/b/aaaa?c" path)

Now I understand you want them in different columns.

An alternative to the query you provided:

SELECT FIRST(SPLIT(path, '/')) part1,
       NTH(2, SPLIT(path, '/')) part2,
       NTH(3, SPLIT(path, '/')) part3
FROM (SELECT "/a/b/aaaa?c" path)

NTH(X, SPLIT(s)) will provide the Xth value from the SPLIT. FIRST(s) is the same as NTH(1, s)

Upvotes: 83

LondonUK
LondonUK

Reputation: 507

You can also try the following with SPLIT function, however you need to know how many '/' your url would have or make enough entries so that if your url contains more '/' then you can still get those values in the seperate columns

  SPLIT(`url`, '/')[safe_ordinal(1)] AS `Col1`, 
  SPLIT(`url`, '/')[safe_ordinal(2)] AS `Col2`,
  SPLIT(`url`, '/')[safe_ordinal(3)] AS `Col3`, 
  SPLIT(`url`, '/')[safe_ordinal(4)] AS `Col4`,
  .
  .
  SPLIT(`url`, '/')[safe_ordinal(N)] AS `ColN`

Upvotes: 10

marengaz
marengaz

Reputation: 1769

in standard sql, you can use the following functions:

array[OFFSET(zero_based_offset)]
array[ORDINAL(one_based_ordinal)]

so

SELECT SPLIT(path, '/')[OFFSET(1)] part2,
       SPLIT(path, '/')[ORDINAL(2)] part2_again,
       SPLIT(path, '/')[ORDINAL(3)] part3
FROM (SELECT "/a/b/aaaa?c" path)

part2   part2_again part3    
a       a           b

part1 in this case, is empty string (before the first slash)

Upvotes: 2

Saumil Agrawal
Saumil Agrawal

Reputation: 351

Solved it in a way.

   SELECT
   date, 
   hits_time, 
   fullVisitorId, 
   visitNumber, 
   hits_hitNumber,
   X.page_path,
   REGEXP_EXTRACT(X.page_path,r'/(\w*)\/') as one,
   REGEXP_EXTRACT(X.page_path,r'/\w*\/(\w*)') as two,
   REGEXP_EXTRACT(X.page_path,r'/\w*\/\w*\/(\w*)') as three,
   REGEXP_EXTRACT(X.page_path,r'/\w*/\w*/\w*\/(\w*)\/.*') as four
   from
   (
   select 
   date, hits_time, fullVisitorId, visitNumber, hits_hitNumber,
   REGEXP_REPLACE (hits_page_pagePath, '-', '') as page_path
   from
   [Intent.All2mon]
   ) X 
   limit 1000

Upvotes: 0

Felipe Hoffa
Felipe Hoffa

Reputation: 59175

This works for me:

SELECT SPLIT(path, '/') part
FROM (SELECT "/a/b/aaaa?c" path)

Row part     
1   a    
2   b    
3   aaaa?c

Not sure why it wouldn't work for you. What does your data look like?

Upvotes: 0

Related Questions