Reputation: 660
I want to get rid of fragments (like #foobar) from URLs, but based on certain rules. Normally a brutal regex would have solved the problem;
$url =~ s/#.+//;
but I want it to take several things into consideration, most notably these transformations
http://www.example.com/#/ => http://www.example.com/
http://www.example.com/#foo/bar#foo => http://www.example.com/#foo/bar
http://www.example.com/#foo?a=1 => http://www.example.com/#foo?a=1
http://www.example.com/#foo/?a=1 => http://www.example.com/#foo/?a=1
So the rules should be:
1) If /#/, just replace it with /.
2) If # is not followed upstream by a / or ?, remove it.
Any ideas how to deal with this properly? One regex or use of other modules?
Upvotes: 0
Views: 56
Reputation: 35198
The regex s{#(?:/|[^?/]*)$}{}
will cover these rules as stated:
/#/
, just replace it with /
.#
is not followed upstream by a /
or ?
, remove it.And the test suite to demonstrate:
use strict;
use warnings;
use Test;
BEGIN { plan tests => 4 }
while (<DATA>) {
chomp;
my ($source, $goal) = split /\s*=>\s*/;
$source =~ s{#(?:/|[^?/]*)$}{};
ok($source, $goal);
}
__DATA__
http://www.example.com/#/ => http://www.example.com/
http://www.example.com/#foo/bar#foo => http://www.example.com/#foo/bar
http://www.example.com/#foo?a=1 => http://www.example.com/#foo?a=1
http://www.example.com/#foo/?a=1 => http://www.example.com/#foo/?a=1
Output:
1..4
# Running under perl version 5.018002 for MSWin32
# Current time local: Fri May 30 15:01:04 2014
# Current time GMT: Fri May 30 22:01:04 2014
# Using Test.pm version 1.26
ok 1
ok 2
ok 3
ok 4
Upvotes: 1