user2198096
user2198096

Reputation: 11

how to capture redirected url in python

I have created a page on my site http://shedez.com/test.html this page redirects the users to a jpg on my server

I want to copy this image to my local drive using a python script. I want the python script to goto main url first and then get to the destination url of the picture

and than copy the image. As of now the destination url is hardcoded but in future it will be dynamic, because I will be using geocoding to find the city via ip and then redirect my users to the picture of day from their city.

== my present script ===

import  urllib2, os

req = urllib2.urlopen("http://shedez.com/test.html")

final_link = req.info()
print req.info()

def get_image(remote, local):   
    imgData = urllib2.urlopen(final_link).read()
    output = open(local,'wb')
    output.write(imgData)
    output.close()
    return local

fn = os.path.join(self.tmp, 'bells.jpg')
firstimg = get_image(final_link, fn)

Upvotes: 1

Views: 1403

Answers (4)

Bibhas Debnath
Bibhas Debnath

Reputation: 14939

It doesn't seem to be header redirection. This is the body of the url -

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">\n<html>\n<head>\n<title>Your Page Title</title>\n<meta http-equiv="REFRESH" content="0;url=htt
p://2.bp.blogspot.com/-hF8PH92aYT0/TnBxwuDdcwI/AAAAAAAAHMo/71umGutZhBY/s1600/Professional%2BBusiness%2BCard%2BDesign%2B1.jpg"></HEAD>\n<BODY>\nOptional page t
ext here.\n</BODY>\n</HTML>

You can easily fetch the content with urllib or requests and parse the HTML with BeautifulSoup or lxml to get the image url from the meta tag.

Upvotes: 3

Supreet Sethi
Supreet Sethi

Reputation: 1806

The urllib2 urlopen function by default follows the redirect 3XX HTTP status code. But in your case you are using html header based redirect for which you will have use what Bibhas is proposing.

Upvotes: 0

jtmoulia
jtmoulia

Reputation: 660

As the answers mention: either redirect to the image itself, or parse out the url from the html.

Concerning the former, redirecting, if you're using nginx or HAproxy server side you can set the X-Accel-Redirect to the image's uri, and it will be served appropriately. See http://wiki.nginx.org/X-accel for more info.

Upvotes: 0

bereal
bereal

Reputation: 34282

You seem to be using html http-equiv redirect. To handle redirects with Python transparently, use HTTP 302 response header on the server side instead. Otherwise, you'll have to parse HTML and follow redirects manually or use something like mechanize.

Upvotes: 1

Related Questions