Reputation: 351
I have a page URL column components of which are delimited by /
. I tried to run the SPLIT()
function in BigQuery but it only gives the first value. I want all values in specific columns.
I don't understand how to use the Regexp_extract()
example mentioned in Split string into multiple columns with bigquery.
I need something similar to REGEX_SPLIT_TO_TABLE(<String>, <DELIMITER>)
which converts a single string into multiple columns.
Query:
SELECT PK,
DATE(TIMESTAMP(CONCAT(SUBSTR(date,1,4),'-',SUBSTR(date,5,2),'-',SUBSTR(date,7,2),' 00:00:00'))) as visit_date,
hits_page_pagePath,
split(hits_page_pagePath,'/')
FROM [Intent.All2mon] limit 100
Upvotes: 33
Views: 208410
Reputation: 59175
#standardSQL
SELECT SPLIT(path, '/')[OFFSET(0)] part1,
SPLIT(path, '/')[OFFSET(1)] part2,
SPLIT(path, '/')[OFFSET(2)] part3
FROM (SELECT "/a/b/aaaa?c" path)
Now I understand you want them in different columns.
An alternative to the query you provided:
SELECT FIRST(SPLIT(path, '/')) part1,
NTH(2, SPLIT(path, '/')) part2,
NTH(3, SPLIT(path, '/')) part3
FROM (SELECT "/a/b/aaaa?c" path)
NTH(X, SPLIT(s))
will provide the Xth value from the SPLIT. FIRST(s)
is the same as NTH(1, s)
Upvotes: 83
Reputation: 507
You can also try the following with SPLIT function, however you need to know how many '/' your url would have or make enough entries so that if your url contains more '/' then you can still get those values in the seperate columns
SPLIT(`url`, '/')[safe_ordinal(1)] AS `Col1`,
SPLIT(`url`, '/')[safe_ordinal(2)] AS `Col2`,
SPLIT(`url`, '/')[safe_ordinal(3)] AS `Col3`,
SPLIT(`url`, '/')[safe_ordinal(4)] AS `Col4`,
.
.
SPLIT(`url`, '/')[safe_ordinal(N)] AS `ColN`
Upvotes: 10
Reputation: 1769
in standard sql, you can use the following functions:
array[OFFSET(zero_based_offset)]
array[ORDINAL(one_based_ordinal)]
so
SELECT SPLIT(path, '/')[OFFSET(1)] part2,
SPLIT(path, '/')[ORDINAL(2)] part2_again,
SPLIT(path, '/')[ORDINAL(3)] part3
FROM (SELECT "/a/b/aaaa?c" path)
part2 part2_again part3
a a b
part1
in this case, is empty string (before the first slash)
Upvotes: 2
Reputation: 351
Solved it in a way.
SELECT
date,
hits_time,
fullVisitorId,
visitNumber,
hits_hitNumber,
X.page_path,
REGEXP_EXTRACT(X.page_path,r'/(\w*)\/') as one,
REGEXP_EXTRACT(X.page_path,r'/\w*\/(\w*)') as two,
REGEXP_EXTRACT(X.page_path,r'/\w*\/\w*\/(\w*)') as three,
REGEXP_EXTRACT(X.page_path,r'/\w*/\w*/\w*\/(\w*)\/.*') as four
from
(
select
date, hits_time, fullVisitorId, visitNumber, hits_hitNumber,
REGEXP_REPLACE (hits_page_pagePath, '-', '') as page_path
from
[Intent.All2mon]
) X
limit 1000
Upvotes: 0
Reputation: 59175
This works for me:
SELECT SPLIT(path, '/') part
FROM (SELECT "/a/b/aaaa?c" path)
Row part
1 a
2 b
3 aaaa?c
Not sure why it wouldn't work for you. What does your data look like?
Upvotes: 0