Omiod
Omiod

Reputation: 11623

Url parsing and simplification in PHP

I'm parsing the links found on webpages, and I'm looking for a way to convert URLs like this:

http://www.site.com/./eng/.././disclaimer/index.htm

to the equivalent and more correct

http://www.site.com/disclaimer/index.htm

mainly for avoiding duplicates.

Thank you.

Upvotes: 3

Views: 773

Answers (2)

user187291
user187291

Reputation: 53940

like this

function simplify($path) {
   $r = array();
   foreach(explode('/', $path) as $p) {
      if($p == '..')
        array_pop($r);
      else if($p != '.' && strlen($p))
        $r[] = $p;
   }
   $r = implode('/', $r);
   if($path[0] == '/') $r = "/$r";
   return $r;
}

and this is how you use it

$u = parse_url($dirtyUrl);
$u['path'] = simplify($u['path']);
$clean_url = "{$u['scheme']}://{$u['host']}{$u['path']}";

Upvotes: 3

chelmertz
chelmertz

Reputation: 20601

Exactly what makes you think those two URL:s are equivalent?

If you can answer this question in a detailed fashion, use a regexp or parser to adhere to the rules which you know indicates that the pages are equivalent.

Upvotes: 0

Related Questions