How do I get a real file url in python 2.7?

Question

I have an url http://www.vbb.de/de/datei/GTFS_VBB_Nov2015_Dez2016.zip which "redirects" me to http://images.vbb.de/assets/ftp/file/286316.zip. Redirect in quotes because python says there is no redirect:

    In [51]: response = requests.get('http://www.vbb.de/de/datei/GTFS_VBB_Nov2015_Dez2016.zip')
        ...: if response.history:
        ...:     print "Request was redirected"
        ...:     for resp in response.history:
        ...:         print resp.status_code, resp.url
        ...:     print "Final destination:"
        ...:     print response.status_code, response.url
        ...: else:
        ...:     print "Request was not redirected"
        ...:     
    Request was not redirected

Status Code is also 200. response.history gives nothing. response.url gives the first url and not the real one. But it's possible to get the real url in firefox -> developer tools -> network. How do I make in python 2.7? Thanks in advance!!

Martin Evans · Accepted Answer

You need to first carry out the redirect manually by parsing the new window.location.href from the first returned HTML. This then creates a 301 reply with the name of the target file contained inside the Location header that is returned:

import requests
import re
import os

base_url = 'http://www.vbb.de'
response = requests.get(base_url + '/de/datei/GTFS_VBB_Nov2015_Dez2016.zip')
manual_redirect = base_url + re.findall('window.location.href\s+=\s+"(.*?)"', response.text)[0]
response = requests.get(manual_redirect, stream=True)
target_filename = response.history[0].headers['Location'].split('/')[-1]

print "Downloading: '{}'".format(target_filename)
with open(target_filename, 'wb') as f_zip:
    for chunk in response.iter_content(chunk_size=1024):
        f_zip.write(chunk)

This would display:

Downloading: '286316.zip'

and result in a 29,464,299 byte zip file being created.

How do I get a real file url in python 2.7?

Answers (2)

Related Questions