HaloKu
HaloKu

Reputation: 435

How to use regex to extract string in SQL?

Documentation says:

regexp_extract(string, pattern)varchar Returns the first substring matched by the regular expression pattern in string https://prestodb.io/docs/current/functions/regexp.html

I have the following query

select regexp_extract('sssshttps://jira.domain.com/browse/PR-6835hhhh',
'/(https.*\/browse\/)(\w+\-\d+)/g')

This return null when it should return https://jira.domain.com/browse/PR-6835 The regexp seems OK it works in many regexp tools that I checked. Why I can't extract the substring of the url link?

Upvotes: 1

Views: 15356

Answers (1)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520898

Note that forward slash and dash are not regex metacharacters, and so do not need to be escaped, at least not in Presto regex. Consider this version:

select regexp_extract('sssshttps://jira.domain.com/browse/PR-6835hhhh',
                      'https.*?/browse/\w+-\d+')

This should return:

https://jira.domain.com/browse/PR-6835

Your current regex appears to be coming from some other language, such as JavaScript or PHP, where the regex literal has a delimiter /, therefore requiring that / needs to be escaped.

I just edited my answer to use lazy dot .*? when matching content between https and the first /browse. This should fix the edge case mentioned in your comment where .* is matching across multiple URLs.

Upvotes: 3

Related Questions