Blurie
Blurie

Reputation: 148

How to get the protocol (http or https) of the website using Python

I'm just thinking about how we can imitate how browser, say Chrome, detects the protocol of the website with Python. For example we type "stackoverflow.com" on the address bar, then press Enter, browser can automatically detects and change the url to "https://stackoverflow.com" (add website's protocol), I wonder how we can do it in Python, exactly like:

url = "stackoverflow.com"
browser = Browser (url) # Browser is a class that we can get website content from url, get its protocol,...
print browser.protocol

https

Is there any library or package that help do this? Thanks a lot.

Edit: My question is unique since other question ask how to redirect to https if we enter http, as I mention, can we automatically detect at the first stage without dummy protocol?

Upvotes: 5

Views: 20135

Answers (3)

Jacobm001
Jacobm001

Reputation: 4539

When you enter a url without http:// or https:// the browser automatically assumes that you're using http:// and sends a request on port 80.

If the site redirects you to an https site, you'll get two headers of note. One will have a response of 301 which indicates a nonerror redirect. The other will be 101 which indicates that you're upgrading your connection type.

You can see this happen if you open a new tab and load http://stackexchange.com and watch the packes as they come in on the network tab of your web browser's developer tool suite.

Note:

Both codes are dependent on the host supporting this behavior. Not all websites will automatically redirect you to an https:// site. Additionally, not all of them support http2, so you may not get the 101 upgrade.

If you really want to determine if https:// is the preferred option, you may want to manually check if it exists when you don't get a redirect.

Upvotes: 5

Vinícius Figueiredo
Vinícius Figueiredo

Reputation: 6508

Since you mentioned "browser" and "Chrome" behaviour, one can get the same results as @BurkhanKhalid's really good answer using selenium:

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("http://stackoverflow.com") #Trying http first
url = driver.current_url

>>> print(url[:url.find(":")])
https

Upvotes: 3

Burhan Khalid
Burhan Khalid

Reputation: 174624

It works for stackoverflow because when you first visit stackoverflow.com on port 80 (the http port), stackoverflow's servers notify the browser that the link has been permanently moved to https.

To detect the same in Python, use the requests library, like this:

>>> import requests
>>> r = requests.get('http://stackoverflow.com') # first we try http
>>> r.url # check the actual URL for the site
'https://stackoverflow.com/'

To find out how the URL changed, look at the history object, and you will see a 301 response, which means the URI has moved permanently to a new address.

>>> r.history[0]
<Response [301]>
>>> r.history[0].url # this is the original URL we tried
'http://stackoverflow.com/'

Upvotes: 16

Related Questions