Mario M.
Mario M.

Reputation: 872

BigQuery SPLIT with multiple delimiters

I want to make a left split URIs when a character like "?" or "#" appears.

SELECT 
CASE
WHEN URI LIKE '%?%' THEN FIRST(SPLIT(URI, "?"))
WHEN URI LIKE '%#%' THEN FIRST(SPLIT(URI, "#"))
END WITHIN RECORD AS URI_FILTER
FROM (SELECT "/A/A1/AA2/205.html#jfsalf" AS URI)

Output must be: /A/A1/AA2/205.html

Another URI to test:

/A/A1/AA2/205.html?pRef=209888

Error: Scope applied without an aggregation function

Upvotes: 2

Views: 5078

Answers (3)

David Kane
David Kane

Reputation: 143

The SPLIT function returns the data in ARRAY format (nested cells) which can be quite useful for some purposes.

If you desire this format, the alternative is to use the REPLACE operation to convert the various keys (#, %, etc) to a single key you can use in SPLIT.

In the below code, we convert # to ? and then split on ?.

SPLIT( REPLACE( URI, "#", "?" ), "?" )

Upvotes: 4

Elliott Brossard
Elliott Brossard

Reputation: 33745

Use REGEXP_EXTRACT. If it's possible that the URI doesn't have a # or ?, you can use a regular expression that accounts for that case:

#standardSQL
WITH T AS (
  SELECT '/A/A1/AA2/205.html#jfsalf' AS path UNION ALL
  SELECT '/A/A1/AA2/205.html?pRef=209888' AS path UNION ALL
  SELECT '/A/A1/AA2/205.html' AS path
)
SELECT
  REGEXP_EXTRACT(path, r'([^#?]+)') AS left_path
FROM T;

Upvotes: 1

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173028

I think regular expression better fit for this

SELECT 
  URI, REGEXP_EXTRACT(URI, r'(.*?)[?#]')
FROM 
  (SELECT "/A/A1/AA2/205.html#jfsalf" AS URI),
  (SELECT "/A/A1/AA2/205.html?pRef=209888" as URI)

Upvotes: 3

Related Questions