AntonioJunior
AntonioJunior

Reputation: 959

How do retrieve a URL protocol ("http" or "https")?

I am using the PHP lib Simple HTML Dom Parser, as suggested here ( How do you parse and process HTML/XML in PHP? ) to parse a webpage's html content.

To create the DOM, I have to do:

$html = file_get_html('http://www.example.com/');

The problem is that if I do:

$html = file_get_html('www.example.com');

without specifying the URL's protocol, I will get an error.

My question is: How can I get to know if the URL with the protocol is "http://www.example.com/" or "https://www.example.com/" having in hands only the string "www.example.com"?

Upvotes: 1

Views: 836

Answers (3)

IslandCow
IslandCow

Reputation: 3532

You could try to use get_headers() on the http address and look for the Upgrade: request in the header. If you get a valid response, use http. Otherwise, try on https.

Upvotes: 1

etuardu
etuardu

Reputation: 5506

I can't figure out something smarter than assuming "http://" as default and, if it fails, try "https://"

if (!$html = file_get_html('http://' . $url)) $html = file_get_html('https://' . $url);

Upvotes: 2

Rajiv Makhijani
Rajiv Makhijani

Reputation: 3651

There is no way to know because both could be valid. I would assume http:// though because normal practice is to redirect http to https if it is required, and file_get_html should follow an HTTP 301 or 302 redirect.

Upvotes: 2

Related Questions