Reputation: 27
I am trying download captcha image from VBB board by using "mechanize" aka urllib2. This is where captcha locate (login with any username and pass you will be asked captcha):
<img id="imagereg" src="image.php?type=hv&hash=c76c6f3c2e0fc3bf32fd99d36555fa04" alt="" width="201" height="61" border="0" />
i was try retrieve that image but it is download a .php file
br.open('http://www.amaderforum.com/image.php?type=hv&hash=c76c6f3c2e0fc3bf32fd99d36555fa04')
I was change ext name to image but it is not captcha image. Any help?
Below is some info from headers
GET /image.php?type=hv&hash=c76c6f3c2e0fc3bf32fd99d36555fa04 HTTP/1.1\r\nAccept-Encoding: identity\r\nHost: www.amaderforum.com\r\nCookie: bbsessionhash=25e24573ce64dfc95dbb873667f21787; bblastvisit=1312644421; bblastactivity=0\r\nConnection: close\r\nUser-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.17) Gecko/20110420 Firefox/3.6.17\r\n\r\n' reply: 'HTTP/1.1 200 OK\r\n' header: Date: Sat, 06 Aug 2011 15:30:48 GMT
header: Server: Apache
header: X-Powered-By: PHP/5.2.9
header: Content-transfer-encoding: binary
header: Content-disposition: inline; filename=image.jpg
header: Content-Length: 5745
header: Connection: close
header: Content-Type: image/jpeg
Upvotes: 0
Views: 5900
Reputation: 562
Here is a short script that goes to the lost password page, finds the captcha, and downloads the image to out.jpg.
This script requires the lxml library.
Hope this helps. Cheers!
import urllib2
import lxml.html
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/2010010' \
'1 Firefox/4.0.1',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language':'en-us,en;q=0.5',
'Accept-Charset':'ISO-8859-1,utf-8;q=0.7,*;q=0.7'}
req = urllib2.Request('http://www.amaderforum.com/login.php?do=lostpw', None,
headers)
f = urllib2.urlopen(req)
page = f.read()
tree = lxml.html.fromstring(page)
imgurl = "http://www.amaderforum.com/" + \
tree.xpath(".//img[@id='imagereg']")[0].get('src')
req = urllib2.Request(imgurl, None, headers)
f = urllib2.urlopen(req)
img = f.read()
open('out.jpg', 'wb').write(img)
Upvotes: 1