CurtLH
CurtLH

Reputation: 2417

postgres regex_replace everything between the third and last forward slash

I have some URLs within a Postgres table that currently look like this:

http://blue.website.com/categoryA/sectionA/title/6534566234
http://yellow.website.com/categoryA/sectionB/title/36476586
http://green.website.com/categoryB/sectionB/title/34646634

I am trying to create a regex that can be used to transform the URLs to look like this:

http://blue.website.com/6534566234
http://yellow.website.com/36476586
http://green.website.com/34646634

I was able to figure out how to use regex_replace to replace everything up to the last / with nothing:

select regexp_replace('http://blue.website.com/categoryA/sectionA/title/6534566234', '^.*/', '')

regexp_replace|
--------------|
6534566234    | 

However, I'm not sure how to extend this to replace everything between the 3rd / and the last / with nothing.

Upvotes: 1

Views: 604

Answers (1)

Bohemian
Bohemian

Reputation: 425083

Replace with blank (ie "remove") each term separately, using look arounds to target the right parts:

select regexp_replace('http://blue.website.com/categoryA/sectionA/title/6534566234', '(?<!/)/[^/]+(?=/)', '', 'g')

See live demo.

The regex (?<!/)/[^/]+(?=/) works as follows:

  • /[^/]+ matches a slash followed by non-slashes
  • (?<!/) the preceding char is not a slash. This prevents matching the part after //, which you want to keep
  • (?=/) requires the following char to be a slash. This prevents matching the final term, which you want to keep

The final parameter 'g' is the global flag - means replace all matches (not just the first, as would be the case if this parameter were not specified)

Upvotes: 2

Related Questions