Reputation: 148
I'm just thinking about how we can imitate how browser, say Chrome, detects the protocol of the website with Python. For example we type "stackoverflow.com" on the address bar, then press Enter, browser can automatically detects and change the url to "https://stackoverflow.com" (add website's protocol), I wonder how we can do it in Python, exactly like:
url = "stackoverflow.com"
browser = Browser (url) # Browser is a class that we can get website content from url, get its protocol,...
print browser.protocol
https
Is there any library or package that help do this? Thanks a lot.
Edit: My question is unique since other question ask how to redirect to https if we enter http, as I mention, can we automatically detect at the first stage without dummy protocol?
Upvotes: 5
Views: 20135
Reputation: 4539
When you enter a url without http://
or https://
the browser automatically assumes that you're using http://
and sends a request on port 80.
If the site redirects you to an https
site, you'll get two headers of note. One will have a response of 301 which indicates a nonerror redirect. The other will be 101 which indicates that you're upgrading your connection type.
You can see this happen if you open a new tab and load http://stackexchange.com
and watch the packes as they come in on the network tab of your web browser's developer tool suite.
Note:
Both codes are dependent on the host supporting this behavior. Not all websites will automatically redirect you to an https://
site. Additionally, not all of them support http2
, so you may not get the 101
upgrade.
If you really want to determine if https://
is the preferred option, you may want to manually check if it exists when you don't get a redirect.
Upvotes: 5
Reputation: 6508
Since you mentioned "browser" and "Chrome" behaviour, one can get the same results as @BurkhanKhalid's really good answer using selenium
:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("http://stackoverflow.com") #Trying http first
url = driver.current_url
>>> print(url[:url.find(":")])
https
Upvotes: 3
Reputation: 174624
It works for stackoverflow because when you first visit stackoverflow.com on port 80 (the http port), stackoverflow's servers notify the browser that the link has been permanently moved to https.
To detect the same in Python, use the requests
library, like this:
>>> import requests
>>> r = requests.get('http://stackoverflow.com') # first we try http
>>> r.url # check the actual URL for the site
'https://stackoverflow.com/'
To find out how the URL changed, look at the history object, and you will see a 301 response, which means the URI has moved permanently to a new address.
>>> r.history[0]
<Response [301]>
>>> r.history[0].url # this is the original URL we tried
'http://stackoverflow.com/'
Upvotes: 16