Reputation: 1158
I know there are a lot of questions about lazy regex matching, but none of the solutions I've seen have worked. Here's the issue:
Some addresses in my BigQuery results look like this:
www.example.comwww.example.com/path/to/page
apply.example.comapply.example.com/eapp/
I want to strip off the duplicated part to get
www.example.com/path/to/page
apply.example.com/eapp/
I've tried using REGEX_REPLACE()
like this:
REGEXP_REPLACE(raw.query.tidy_Landing, r'(.*?)\.com','') AS Landing_Page
But that still finds both matches and so returns
/path/to/page
/eapp/
What's wrong with my regex?
Upvotes: 0
Views: 1567
Reputation: 33705
I wanted to see if it's possible to do this without a regex, although it ends up being a bit verbose :) This answer assumes that the address is always duplicated and ends with .com
. Assuming that's the case, it should be possible to use SPLIT
to extract the parts that you are interested in:
SELECT
CONCAT(
SPLIT(text, '.com')[OFFSET(0)],
'.com',
SPLIT(text, '.com')[OFFSET(2)]
) AS Landing_Page
FROM (
SELECT 'www.example.comwww.example.com/path/to/page' AS text UNION ALL
SELECT 'apply.example.comapply.example.com/eapp/'
);
If you want the query to be tolerant of non-duplicated addresses, you can make a slight modification:
SELECT
(
SELECT
CONCAT(
parts[OFFSET(0)],
'.com',
parts[OFFSET(ARRAY_LENGTH(parts) - 1)]
)
FROM (SELECT SPLIT(text, '.com') AS parts)
) AS Landing_Page
FROM (
SELECT 'www.example.comwww.example.com/path/to/page' AS text UNION ALL
SELECT 'apply.example.comapply.example.com/eapp/' UNION ALL
SELECT 'www.example.com/path/to/page'
);
Going one step further, you can extract the logic into a UDF:
CREATE TEMP FUNCTION GetLandingPage(text STRING) AS (
(
SELECT
CONCAT(
parts[OFFSET(0)],
'.com',
parts[OFFSET(ARRAY_LENGTH(parts) - 1)]
)
FROM (SELECT SPLIT(text, '.com') AS parts)
)
);
SELECT
GetLandingPage(text) AS Landing_Page
FROM (
SELECT 'www.example.comwww.example.com/path/to/page' AS text UNION ALL
SELECT 'apply.example.comapply.example.com/eapp/' UNION ALL
SELECT 'www.example.com/path/to/page'
);
Upvotes: 1
Reputation: 172974
#standardSQL
WITH t AS (
SELECT 'www.example.comwww.example.com/path/to/page' str UNION ALL
SELECT 'apply.example.comapply.example.com/eapp/'
)
SELECT str, REGEXP_REPLACE(str, r'^(.*?\.com)', '') fix
FROM t
output is
str fix
www.example.comwww.example.com/path/to/page www.example.com/path/to/page
apply.example.comapply.example.com/eapp/ apply.example.com/eapp/
Upvotes: 2