toreau
toreau

Reputation: 660

Removing fragments from URLs

I want to get rid of fragments (like #foobar) from URLs, but based on certain rules. Normally a brutal regex would have solved the problem;

$url =~ s/#.+//;

but I want it to take several things into consideration, most notably these transformations

http://www.example.com/#/           => http://www.example.com/
http://www.example.com/#foo/bar#foo => http://www.example.com/#foo/bar
http://www.example.com/#foo?a=1     => http://www.example.com/#foo?a=1
http://www.example.com/#foo/?a=1    => http://www.example.com/#foo/?a=1

So the rules should be:

1) If /#/, just replace it with /.

2) If # is not followed upstream by a / or ?, remove it.

Any ideas how to deal with this properly? One regex or use of other modules?

Upvotes: 0

Views: 56

Answers (1)

Miller
Miller

Reputation: 35198

The regex s{#(?:/|[^?/]*)$}{} will cover these rules as stated:

  1. If /#/, just replace it with /.
  2. If # is not followed upstream by a / or ?, remove it.

And the test suite to demonstrate:

use strict;
use warnings;

use Test;

BEGIN { plan tests => 4 }

while (<DATA>) {
    chomp;
    my ($source, $goal) = split /\s*=>\s*/;

    $source =~ s{#(?:/|[^?/]*)$}{};

    ok($source, $goal);
}

__DATA__
http://www.example.com/#/           => http://www.example.com/
http://www.example.com/#foo/bar#foo => http://www.example.com/#foo/bar
http://www.example.com/#foo?a=1     => http://www.example.com/#foo?a=1
http://www.example.com/#foo/?a=1    => http://www.example.com/#foo/?a=1

Output:

1..4
# Running under perl version 5.018002 for MSWin32
# Current time local: Fri May 30 15:01:04 2014
# Current time GMT:   Fri May 30 22:01:04 2014
# Using Test.pm version 1.26
ok 1
ok 2
ok 3
ok 4

Upvotes: 1

Related Questions