urllib2 download captcha image

Question

I am trying download captcha image from VBB board by using "mechanize" aka urllib2. This is where captcha locate (login with any username and pass you will be asked captcha):

i was try retrieve that image but it is download a .php file

br.open('http://www.amaderforum.com/image.php?type=hv&hash=c76c6f3c2e0fc3bf32fd99d36555fa04')

I was change ext name to image but it is not captcha image. Any help?

Below is some info from headers

GET /image.php?type=hv&hash=c76c6f3c2e0fc3bf32fd99d36555fa04 HTTP/1.1 Accept-Encoding: identity Host: www.amaderforum.com Cookie: bbsessionhash=25e24573ce64dfc95dbb873667f21787; bblastvisit=1312644421; bblastactivity=0 Connection: close User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.17) Gecko/20110420 Firefox/3.6.17 ' reply: 'HTTP/1.1 200 OK ' header: Date: Sat, 06 Aug 2011 15:30:48 GMT

header: Server: Apache

header: X-Powered-By: PHP/5.2.9

header: Content-transfer-encoding: binary

header: Content-disposition: inline; filename=image.jpg

header: Content-Length: 5745

header: Connection: close

header: Content-Type: image/jpeg

odie5533 · Accepted Answer

Here is a short script that goes to the lost password page, finds the captcha, and downloads the image to out.jpg.

This script requires the lxml library.

Hope this helps. Cheers!

import urllib2
import lxml.html

headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/2010010' \
    '1 Firefox/4.0.1',
    'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language':'en-us,en;q=0.5',
    'Accept-Charset':'ISO-8859-1,utf-8;q=0.7,*;q=0.7'}

req = urllib2.Request('http://www.amaderforum.com/login.php?do=lostpw', None,
                      headers)
f = urllib2.urlopen(req)
page = f.read()

tree = lxml.html.fromstring(page)
imgurl = "http://www.amaderforum.com/" + \
      tree.xpath(".//img[@id='imagereg']")[0].get('src')

req = urllib2.Request(imgurl, None, headers)
f = urllib2.urlopen(req)
img = f.read()

open('out.jpg', 'wb').write(img)

urllib2 download captcha image

Answers (1)

Related Questions