Reputation: 435
Documentation says:
regexp_extract(string, pattern)
→ varchar
Returns the first substring matched by the regular expression pattern in string
https://prestodb.io/docs/current/functions/regexp.html
I have the following query
select regexp_extract('sssshttps://jira.domain.com/browse/PR-6835hhhh',
'/(https.*\/browse\/)(\w+\-\d+)/g')
This return null
when it should return https://jira.domain.com/browse/PR-6835
The regexp seems OK it works in many regexp tools that I checked.
Why I can't extract the substring of the url link?
Upvotes: 1
Views: 15356
Reputation: 520898
Note that forward slash and dash are not regex metacharacters, and so do not need to be escaped, at least not in Presto regex. Consider this version:
select regexp_extract('sssshttps://jira.domain.com/browse/PR-6835hhhh',
'https.*?/browse/\w+-\d+')
This should return:
https://jira.domain.com/browse/PR-6835
Your current regex appears to be coming from some other language, such as JavaScript or PHP, where the regex literal has a delimiter /
, therefore requiring that /
needs to be escaped.
I just edited my answer to use lazy dot .*?
when matching content between https
and the first /browse
. This should fix the edge case mentioned in your comment where .*
is matching across multiple URLs.
Upvotes: 3