Reputation: 439
I am trying to download a captcha image with Selenium, however, I'm getting a different image downloaded than the one showed in the browser. If I try to download the image again, without changing the browser, I get a different one.
Any thoughts?
from selenium import webdriver
import urllib
driver = webdriver.Firefox()
driver.get("http://sistemas.cvm.gov.br/?fundosreg")
# Change frame.
driver.switch_to.frame("Main")
# Download image/captcha.
img = driver.find_element_by_xpath(".//*[@id='trRandom3']/td[2]/img")
src = img.get_attribute('src')
urllib.request.urlretrieve(src, "captcha.jpeg")
Upvotes: 7
Views: 19631
Reputation: 42518
You can get the rendered image of the captacha with a piece of Javascript. It is faster than taking and cropping a screenshot:
import base64
from selenium import webdriver
driver = webdriver.Firefox()
driver.set_script_timeout(10)
driver.get("http://sistemas.cvm.gov.br/?fundosreg")
driver.switch_to.frame("Main")
# find the captcha element
ele_captcha = driver.find_element_by_xpath("//img[contains(./@src, 'RandomTxt.aspx')]")
# get the captcha as a base64 string
img_captcha_base64 = driver.execute_async_script("""
var ele = arguments[0], callback = arguments[1];
ele.addEventListener('load', function fn(){
ele.removeEventListener('load', fn, false);
var cnv = document.createElement('canvas');
cnv.width = this.width; cnv.height = this.height;
cnv.getContext('2d').drawImage(this, 0, 0);
callback(cnv.toDataURL('image/jpeg').substring(22));
}, false);
ele.dispatchEvent(new Event('load'));
""", ele_captcha)
# save the captcha to a file
with open(r"captcha.jpg", 'wb') as f:
f.write(base64.b64decode(img_captcha_base64))
EDIT :
Selenium just removed find_element_by_xpath
method in version 4.3.0
. See the CHANGES:
https://github.com/SeleniumHQ/selenium/blob/a4995e2c096239b42c373f26498a6c9bb4f2b3e7/py/CHANGES
Selenium 4.3.0
* Deprecated find_element_by_* and find_elements_by_* are now removed (#10712)
* Deprecated Opera support has been removed (#10630)
* Fully upgraded from python 2x to 3.7 syntax and features (#10647)
* Added a devtools version fallback mechanism to look for an older version when mismatch occurs (#10749)
* Better support for co-operative multi inheritance by utilising super() throughout
* Improved type hints throughout
The method must be changed from
ele_captcha = driver.find_element_by_xpath("//img[contains(./@src, 'RandomTxt.aspx')]")
TO :
ele_captcha = driver.find_element("xpath", "//img[contains(./@src, 'RandomTxt.aspx')]")
Full working script :
import base64
from selenium import webdriver
driver = webdriver.Firefox()
driver.set_script_timeout(10)
driver.get("http://sistemas.cvm.gov.br/?fundosreg")
driver.switch_to.frame("Main")
# find the captcha element
ele_captcha = driver.find_element("xpath", "//img[contains(./@src, 'RandomTxt.aspx')]")
# get the captcha as a base64 string
img_captcha_base64 = driver.execute_async_script("""
var ele = arguments[0], callback = arguments[1];
ele.addEventListener('load', function fn(){
ele.removeEventListener('load', fn, false);
var cnv = document.createElement('canvas');
cnv.width = this.width; cnv.height = this.height;
cnv.getContext('2d').drawImage(this, 0, 0);
callback(cnv.toDataURL('image/jpeg').substring(22));
}, false);
ele.dispatchEvent(new Event('load'));
""", ele_captcha)
# save the captcha to a file
with open(r"captcha.jpg", 'wb') as f:
f.write(base64.b64decode(img_captcha_base64))
Upvotes: 20
Reputation: 2400
If you already have image loaded, instead of execute_async_script, go with
import base64
img_base64 = browser.execute_script("""
var ele = arguments[0];
var cnv = document.createElement('canvas');
cnv.width = ele.width; cnv.height = ele.height;
cnv.getContext('2d').drawImage(ele, 0, 0);
return cnv.toDataURL('image/jpeg').substring(22);
""", browser.find_element_by_xpath("//your_xpath"))
with open(r"image.jpg", 'wb') as f:
f.write(base64.b64decode(img_base64))
Upvotes: 14
Reputation: 22272
Because the link of image's src
gives you a random new captcha image once you open that link!
Instead of download the file from the image's src
, you can take a screenshot to get the one in browser. However, you need to download Pillow
(pip install Pillow
) and use it like the way mentioned in this answer:
from PIL import Image
from selenium import webdriver
def get_captcha(driver, element, path):
# now that we have the preliminary stuff out of the way time to get that image :D
location = element.location
size = element.size
# saves screenshot of entire page
driver.save_screenshot(path)
# uses PIL library to open image in memory
image = Image.open(path)
left = location['x']
top = location['y'] + 140
right = location['x'] + size['width']
bottom = location['y'] + size['height'] + 140
image = image.crop((left, top, right, bottom)) # defines crop points
image.save(path, 'jpeg') # saves new cropped image
driver = webdriver.Firefox()
driver.get("http://sistemas.cvm.gov.br/?fundosreg")
# change frame
driver.switch_to.frame("Main")
# download image/captcha
img = driver.find_element_by_xpath(".//*[@id='trRandom3']/td[2]/img")
get_captcha(driver, img, "captcha.jpeg")
driver = webdriver.Firefox()
driver.get("http://sistemas.cvm.gov.br/?fundosreg")
# change frame
driver.switch_to.frame("Main")
# download image/captcha
img = driver.find_element_by_xpath(".//*[@id='trRandom3']/td[2]/img")
get_captcha(driver, img, "captcha.jpeg")
(Note that I've changed the code little bit so it could works in your case.)
Upvotes: 16