Reputation: 15379
I'm trying to get a regex to capture the base URL from a URL string. This
^(.+?[^\/:])(?=[?\/]|$)
works. REGEX101
But when I try to use it within postgresql
regexp_replace(content_url,'^(.+?[^\\/:])(?=[?\\/]|$)', '\1')
it does not
Upvotes: 1
Views: 655
Reputation: 15379
PostGreSQL has an interesting regular expression engine. It took me a while to figure out what be escaped and what needs to be double-escaped. The solution that worked for me is:
(regexp_matches(content_url,'(https?:\/\/\\w+(?:\\.\\w+)+)'))[1] AS content_url
Hope this can help someone.
Upvotes: 0
Reputation: 9806
RegexBuddy gives this warning about the first '?'
PostgreSQL is inconsistent in the way it handles lazy quantifiers in regular expressions with alternation because it attempts to match the longest alternative, instead of being eager and accepting the first alternative that matches
and if you remove it, it seems to work, i.e ^(.+[^\/:])(?=[?\/]|$)
however, if you're trying to parse the baseurl that regex won't work. Use this instead:
select regexp_replace('....', '^(.*:)//([a-z\-.]+)(:[0-9]+)?(.*)$', '\2')
Upvotes: 1