Reputation: 107
I'm trying to scrape this (https://www.qconcursos.com/questoes-de-concursos/questoes?discipline_ids%5B%5D=13&discipline_ids%5B%5D=16&discipline_ids%5B%5D=39&discipline_ids%5B%5D=46&discipline_ids%5B%5D=56&discipline_ids%5B%5D=57&examining_board_ids%5B%5D=1&examining_board_ids%5B%5D=2&examining_board_ids%5B%5D=5&page=2&scholarity_ids%5B%5D=1&scholarity_ids%5B%5D=2) webpage. I'm extracting all images on the website. However, they do not contain size (width, height) attributes, so they are extracted with their original one. That being said, the image ends up way too big. This is why I'm extracting the rendered size and adding width and height tag to every single tag.
Example:
<img src="https://s3.amazonaws.com/assets.qconcursos-hmg.com/cms/brazil-week/logo.svg">
has to become
<img src="https://s3.amazonaws.com/assets.qconcursos-hmg.com/cms/brazil-week/logo.svg" height="32" width="120">
I'm able to get all images and the correct size. My problem is: I'm not able to insert the values into the tags.
This is the code I'm trying to use:
driver.execute_script(f'let element = document.querySelector("#image_sec>img"); element.setAttribute("width", "{w}"); element.setAttribute("height", "{h}");')
So I need the CSS selector to find the element with javascript and set the attributes to the tag.
This is a code you can use to reproduce the problem:
from selenium import webdriver
from selenium.webdriver import ChromeOptions, Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException, TimeoutException
from bs4 import BeautifulSoup
import requests
import undetected_chromedriver as uc
from scrapy.selector import Selector
options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-logging"])
options.add_argument("start-maximized")
driver = uc.Chrome(options=options)
driver.get("https://www.qconcursos.com/questoes-de-concursos/questoes?discipline_ids%5B%5D=13&discipline_ids%5B%5D=16&discipline_ids%5B%5D=39&discipline_ids%5B%5D=46&discipline_ids%5B%5D=56&discipline_ids%5B%5D=57&examining_board_ids%5B%5D=1&examining_board_ids%5B%5D=2&examining_board_ids%5B%5D=5&page=2&scholarity_ids%5B%5D=1&scholarity_ids%5B%5D=2")
while True:
soup = BeautifulSoup(driver.page_source, 'html.parser')
link = driver.current_url
try:
images = driver.find_elements(By.XPATH, '//img')
for img in images:
size = img.size
w, h = size['width'], size['height']
driver.execute_script(f'let element = document.querySelector("#image_sec>img"); element.setAttribute("width", "{w}"); element.setAttribute("height", "{h}");')
except NoSuchElementException:
pass
Update: I've tried following this solution but no success. Is there a way to extract the CSS selector with Selenium?, there are 2 answers. The first one retrieves the tags, but doesn't add any attributes. The same for the second one.
Upvotes: 0
Views: 396
Reputation: 107
I've found a solution. https://www.reddit.com/r/learnpython/comments/pgp5cg/how_can_i_extract_a_css_selector_of_an_element/hbcydib/?context=3
driver.execute_script(f'arguments[0].setAttribute("width", "{w}"); arguments[0].setAttribute("height", "{h}");', img)
I'm not very familiar with Javascript so don't quote me on that. But I believe arguments[0]
is passing my element as this
, so it's basically telling my code to set a new attribute to my img
element.
This is the working example
try:
images = driver.find_elements(By.XPATH, './/img')
for img in images:
size = img.size
w, h = size['width'], size['height']
driver.execute_script(f'arguments[0].setAttribute("width", "{w}"); arguments[0].setAttribute("height", "{h}");', img)
except NoSuchElementException:
pass
Upvotes: 2
Reputation: 14
This web is protected with Cloudflare and your code doesn't work
Upvotes: 0