Reputation: 1422
I'm currently trying to audit a large number of redirect URL handles to make sure that their destinations are still valid.
I'm using aiohttp to go through the large volume in order to produce a report.
try:
with aiohttp.Timeout(timeout):
async with session.get(url) as resp:
return {"Handle URL": url,
"Status Code": resp.status,
"Redirects": resp.url != url,
"Resolving URL": resp.url,
"Success": resp.status == 200,
"Message": ""}
except asyncio.TimeoutError:
return {"Handle URL": url,
"Success": False,
"Message": "Handle server timed out. >{} seconds".format(timeout)}
For the most part, this has been fine for identifying which URL redirect no longer sends to a valid URL. However, I'd really like to know the final address where times out.
Any ideas?
Upvotes: 4
Views: 9525
Reputation: 30106
I don't think it is necessary anymore to parse that string for a Location. Here is a small example.
Local flask server with a redirect:
from flask import Flask, redirect
app = Flask(__name__)
@app.route('/')
def hello_world():
return 'Hello World!'
@app.route('/redirect')
def redir():
return redirect('/')
if __name__ == '__main__':
app.run()
aiohttp request to that redirect:
# coding: utf-8
import asyncio
import aiohttp
async def fetch(URL):
async with aiohttp.ClientSession() as session:
async with session.get(URL, allow_redirects=False) as response:
print(response.url, response.real_url, 'location' in str(response).lower())
async with session.get(URL, allow_redirects=True) as response:
print(response.url, response.real_url, 'location' in str(response).lower())
url = "http://127.0.0.1:5000/redirect"
async def main():
await fetch(local_url)
loop = asyncio.new_event_loop()
loop.run_until_complete(main())
prints:
http://127.0.0.1:5000/redirect http://127.0.0.1:5000/redirect True
http://127.0.0.1:5000/ http://127.0.0.1:5000/ False
According to the docs, the difference between url
and real_url
is that real_url is the raw string of the original request, not stripped.
Upvotes: 1
Reputation: 705
async with aiohttp.ClientSession() as session:
async with session.get(URL, allow_redirects=False) as response:
Location = str(response).split("Location': \'")[1].split("\'")[0]
return Location
Upvotes: 3
Reputation: 17376
The only way to do it is disabling redirects by allow_redirects=False
and performing redirections manually.
Upvotes: 6