Reputation: 410
I want to pull the slug from a URL using SQL regular expressions
https://example.com/december-2019/content/this-is-the-slug/
I eventually came across this
SELECT
regexp_extract(url, r'\/([a-z0-9_-]*[\/]?)$') slug
FROM table
This gives me this-is-the-slug/
However, I just want this-is-the-slug
I know I can just nest my query and remove the end '/', but is there anything I can add to the RegExp '\/([a-z0-9_-]*[\/]?)$'
above that will do it all in one?
Also, how does the RegExp work? I'm a bit confused about how it operates. Is it something like this
$
# start at end of string
/( )
# work back and find the first '/' and create a capturing group ()
[ ]*
# create a character set [] in the capturing group and match all elements in it using *
a-z0-9_-
# match all these characters in the character set and pull them out.
[\/]?
# something to do with greedy matching? Not sure what this does.
Any help on this would be great. Thanks
Upvotes: 0
Views: 940
Reputation: 172974
I would still recommend using REGEXP in cases like yours - your regular expression was very close to what it should of be - you just need to move /?
outside of captured group like in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'https://example.com/december-2019/content/this-is-the-slug1/' url UNION ALL
SELECT 'https://example.com/december-2019/content/this-is-the-slug2'
)
SELECT
REGEXP_EXTRACT(url, r'\/([a-z0-9_-]*?)/?$') slug
FROM `project.dataset.table`
with result
Row slug
1 this-is-the-slug1
2 this-is-the-slug2
Upvotes: 0
Reputation: 520968
Here is a non regex option using the SPLIT()
function:
SELECT ARRAY_REVERSE(SPLIT(RTRIM(url, '/'), '/'))[SAFE_OFFSET(0)]
FROM yourTable;
This approach removes the trailing path separator /
, then splits the URL on /
to generate an array. Note that we use RTRIM
for this purpose. This means that the URL input won't be affected if it doesn't have a trailing path separator. The last entry in that array is retained, which should be the slug.
Upvotes: 1