loneraver
loneraver

Reputation: 1422

Using Python and 3 aiohttp to find the URL after redirect when timeout

I'm currently trying to audit a large number of redirect URL handles to make sure that their destinations are still valid.

I'm using aiohttp to go through the large volume in order to produce a report.

try:
    with aiohttp.Timeout(timeout):
        async with session.get(url) as resp:
            return {"Handle URL": url,
                    "Status Code": resp.status,
                    "Redirects": resp.url != url,
                    "Resolving URL": resp.url,
                    "Success": resp.status == 200,
                    "Message": ""}
except asyncio.TimeoutError:
        return {"Handle URL": url,
                "Success": False,
                "Message": "Handle server timed out. >{} seconds".format(timeout)}

For the most part, this has been fine for identifying which URL redirect no longer sends to a valid URL. However, I'd really like to know the final address where times out.

Any ideas?

Upvotes: 4

Views: 9525

Answers (3)

lhk
lhk

Reputation: 30106

I don't think it is necessary anymore to parse that string for a Location. Here is a small example.

Local flask server with a redirect:

from flask import Flask, redirect

app = Flask(__name__)


@app.route('/')
def hello_world():
    return 'Hello World!'

@app.route('/redirect')
def redir():
    return redirect('/')


if __name__ == '__main__':
    app.run()

aiohttp request to that redirect:

# coding: utf-8
import asyncio
import aiohttp


async def fetch(URL):
    async with aiohttp.ClientSession() as session:
        async with session.get(URL, allow_redirects=False) as response:
            print(response.url, response.real_url, 'location' in str(response).lower())

        async with session.get(URL, allow_redirects=True) as response:
            print(response.url, response.real_url, 'location' in str(response).lower())

url = "http://127.0.0.1:5000/redirect"

async def main():
    await fetch(local_url)

loop = asyncio.new_event_loop()
loop.run_until_complete(main())

prints:

http://127.0.0.1:5000/redirect http://127.0.0.1:5000/redirect True
http://127.0.0.1:5000/ http://127.0.0.1:5000/ False

According to the docs, the difference between url and real_url is that real_url is the raw string of the original request, not stripped.

Upvotes: 1

user2643679
user2643679

Reputation: 705

async with aiohttp.ClientSession() as session:
    async with session.get(URL, allow_redirects=False) as response:
        Location = str(response).split("Location': \'")[1].split("\'")[0]
            return Location

Upvotes: 3

Andrew Svetlov
Andrew Svetlov

Reputation: 17376

The only way to do it is disabling redirects by allow_redirects=False and performing redirections manually.

Upvotes: 6

Related Questions