Daniel Pilch
Daniel Pilch

Reputation: 2247

Python Requests library redirect new url

I've been looking through the Python Requests documentation but I cannot see any functionality for what I am trying to achieve.

In my script I am setting allow_redirects=True.

I would like to know if the page has been redirected to something else, what is the new URL.

For example, if the start URL was: www.google.com/redirect

And the final URL is www.google.co.uk/redirected

How do I get that URL?

Upvotes: 155

Views: 336415

Answers (8)

Jossef Harush Kadouri
Jossef Harush Kadouri

Reputation: 34207

I wrote the following function to get the full URL from a short URL (bit.ly, t.co, ...)

import requests

def expand_short_url(url):
    r = requests.head(url, allow_redirects=False)
    r.raise_for_status()
    if 300 < r.status_code < 400:
        url = r.headers.get('Location', url)

    return url

Usage (short URL is this question's url):

short_url = 'https://tinyurl.com/' + '4d4ytpbx'
full_url = expand_short_url(short_url)
print(full_url)

Output:

https://stackoverflow.com/questions/20475552/python-requests-library-redirect-new-url

Upvotes: 3

Tushar
Tushar

Reputation: 1104

All the answers are applicable where the final url exists/working fine. In case, final URL doesn't seems to work then below is way to capture all redirects. There was scenario where final URL isn't working anymore and other ways like url history give error.
Code Snippet

long_url = ''
url = 'http://example.com/bla-bla'
try:
    while True:
        long_url = requests.head(url).headers['location']
        print(long_url)
        url = long_url
except:
    print(long_url)

Upvotes: 1

Shahin Shirazi
Shahin Shirazi

Reputation: 439

I wasn't able to use requests library and had to go different way. Here is the code that I post as solution to this post. (To get redirected URL with requests)

This way you actually open the browser, wait for your browser to log the url in the history log and then read last url in your history. I wrote this code for google chrom, but you should be able to follow along if you are using different browser.

import webbrowser
import sqlite3
import pandas as pd
import shutil

webbrowser.open("https://twitter.com/i/user/2274951674")
#source file is where the history of your webbroser is saved, I was using chrome, but it should be the same process if you are using different browser
source_file = 'C:\\Users\\{your_user_id}\\AppData\\Local\\Google\\Chrome\\User Data\\Default\\History'
# could not directly connect to history file as it was locked and had to make a copy of it in different location
destination_file = 'C:\\Users\\{user}\\Downloads\\History'
time.sleep(30) # there is some delay to update the history file, so 30 sec wait give it enough time to make sure your last url get logged
shutil.copy(source_file,destination_file) # copying the file.
con = sqlite3.connect('C:\\Users\\{user}\\Downloads\\History')#connecting to browser history
cursor = con.execute("SELECT * FROM urls")
names = [description[0] for description in cursor.description]
urls = cursor.fetchall()
con.close()
df_history = pd.DataFrame(urls,columns=names)
last_url = df_history.loc[len(df_history)-1,'url']
print(last_url)

>>https://twitter.com/ozanbayram01

Upvotes: -1

Geng  Jiawen
Geng Jiawen

Reputation: 9154

I think requests.head instead of requests.get will be more safe to call when handling url redirect. Check a GitHub issue here:

r = requests.head(url, allow_redirects=True)
print(r.url)

Upvotes: 59

Martijn Pieters
Martijn Pieters

Reputation: 1121176

You are looking for the request history.

The response.history attribute is a list of responses that led to the final URL, which can be found in response.url.

response = requests.get(someurl)
if response.history:
    print("Request was redirected")
    for resp in response.history:
        print(resp.status_code, resp.url)
    print("Final destination:")
    print(response.status_code, response.url)
else:
    print("Request was not redirected")

Demo:

>>> import requests
>>> response = requests.get('http://httpbin.org/redirect/3')
>>> response.history
(<Response [302]>, <Response [302]>, <Response [302]>)
>>> for resp in response.history:
...     print(resp.status_code, resp.url)
... 
302 http://httpbin.org/redirect/3
302 http://httpbin.org/redirect/2
302 http://httpbin.org/redirect/1
>>> print(response.status_code, response.url)
200 http://httpbin.org/get

Upvotes: 222

Back2Basics
Back2Basics

Reputation: 7806

the documentation has this blurb https://requests.readthedocs.io/en/master/user/quickstart/#redirection-and-history

import requests

r = requests.get('http://www.github.com')
r.url
#returns https://www.github.com instead of the http page you asked for 

Upvotes: 48

Shuai.Z
Shuai.Z

Reputation: 386

For python3.5, you can use the following code:

import urllib.request
res = urllib.request.urlopen(starturl)
finalurl = res.geturl()
print(finalurl)

Upvotes: 14

hwjp
hwjp

Reputation: 16071

This is answering a slightly different question, but since I got stuck on this myself, I hope it might be useful for someone else.

If you want to use allow_redirects=False and get directly to the first redirect object, rather than following a chain of them, and you just want to get the redirect location directly out of the 302 response object, then r.url won't work. Instead, it's the "Location" header:

r = requests.get('http://github.com/', allow_redirects=False)
r.status_code  # 302
r.url  # http://github.com, not https.
r.headers['Location']  # https://github.com/ -- the redirect destination

Upvotes: 102

Related Questions