user220755
user220755

Reputation: 4446

How to know if two URLS lead to the same page

I want to write a small script to know if two URLS lead to the same page. For example: http://google.com and http://google.com/# will lead to the same URL. Also sometimes http://URL1.com and http://URL2.com also lead to the same page although they are not the same URL.

Is there an easy way to do that?

If you need more information please tell me and i will edit the post

NOTE: this is NOT a homework question so please be as helpful as you can.

Thank you all!

Upvotes: 1

Views: 1414

Answers (3)

LiraNuna
LiraNuna

Reputation: 67330

This is a really dirty way, but I suppose that's what you want:

if(file_get_contents('http://URL1.com') === file_get_contents('http://URL2.com')) {
    // Leading to the same page!
}

Note that it will NOT work if the page is having minor changes such as time (i.e, request is made exactly between 13:45:59 and 13:46:00), cookie, or anything dynamic.

Upvotes: 4

Tyler Carter
Tyler Carter

Reputation: 61597

So...

This can be very tricky, as there is no 'real' way to detect it. You could detect a Location header to see if there is a redirect, but that is not foolproof, as some people do an internal redirect. (Meaning, stackoverflow.com looks the same as stackoverflow2.com.)

The only real way I can think of is to check the contents of the page:

AKA

$c = curl_init();
curl_setopt( $c, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $c, CURLOPT_URL, 'http://localhost/admin/' );
$content1 = curl_exec( $c );
curl_close($c);

$c = curl_init();
curl_setopt( $c, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $c, CURLOPT_URL, 'http://localhost/admin/' );
$content2 = curl_exec( $c );
curl_close($c);

if($content1 == $content2)
{
    // same content
}

If you wanted to, you could shorten that to only check Content-Length or something else, but you wouldn't be able to tell simply from headers.

Upvotes: 2

Ignacio Vazquez-Abrams
Ignacio Vazquez-Abrams

Reputation: 799560

You can use parse_url() to handle the trivial cases. For detecting redirects you'll have to use one of the parts of the HTTP facilities to get the headers and detect the Location header.

Upvotes: 0

Related Questions