1252748
1252748

Reputation: 15379

Why won't this regex work in postgreSQL

I'm trying to get a regex to capture the base URL from a URL string. This

^(.+?[^\/:])(?=[?\/]|$)

works. REGEX101

But when I try to use it within postgresql

regexp_replace(content_url,'^(.+?[^\\/:])(?=[?\\/]|$)', '\1') 

it does not

Upvotes: 1

Views: 655

Answers (2)

1252748
1252748

Reputation: 15379

PostGreSQL has an interesting regular expression engine. It took me a while to figure out what be escaped and what needs to be double-escaped. The solution that worked for me is:

(regexp_matches(content_url,'(https?:\/\/\\w+(?:\\.\\w+)+)'))[1] AS content_url

Hope this can help someone.

Upvotes: 0

gordy
gordy

Reputation: 9806

RegexBuddy gives this warning about the first '?'

PostgreSQL is inconsistent in the way it handles lazy quantifiers in regular expressions with alternation because it attempts to match the longest alternative, instead of being eager and accepting the first alternative that matches

and if you remove it, it seems to work, i.e ^(.+[^\/:])(?=[?\/]|$)

however, if you're trying to parse the baseurl that regex won't work. Use this instead:

select regexp_replace('....', '^(.*:)//([a-z\-.]+)(:[0-9]+)?(.*)$', '\2')

Upvotes: 1

Related Questions