Is there a way to extract the CSS selector with Selenium?

Question

I'm trying to scrape this (https://www.qconcursos.com/questoes-de-concursos/questoes?discipline_ids%5B%5D=13&discipline_ids%5B%5D=16&discipline_ids%5B%5D=39&discipline_ids%5B%5D=46&discipline_ids%5B%5D=56&discipline_ids%5B%5D=57&examining_board_ids%5B%5D=1&examining_board_ids%5B%5D=2&examining_board_ids%5B%5D=5&page=2&scholarity_ids%5B%5D=1&scholarity_ids%5B%5D=2) webpage. I'm extracting all images on the website. However, they do not contain size (width, height) attributes, so they are extracted with their original one. That being said, the image ends up way too big. This is why I'm extracting the rendered size and adding width and height tag to every single tag.

Example:

has to become

I'm able to get all images and the correct size. My problem is: I'm not able to insert the values into the tags.

This is the code I'm trying to use:

driver.execute_script(f'let element =  document.querySelector("#image_sec>img"); element.setAttribute("width", "{w}"); element.setAttribute("height", "{h}");')

So I need the CSS selector to find the element with javascript and set the attributes to the tag.

This is a code you can use to reproduce the problem:

from selenium import webdriver
from selenium.webdriver import ChromeOptions, Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import NoSuchElementException, TimeoutException
from bs4 import BeautifulSoup
import requests
import undetected_chromedriver as uc
from scrapy.selector import Selector

options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-logging"])
options.add_argument("start-maximized")
driver = uc.Chrome(options=options)
driver.get("https://www.qconcursos.com/questoes-de-concursos/questoes?discipline_ids%5B%5D=13&discipline_ids%5B%5D=16&discipline_ids%5B%5D=39&discipline_ids%5B%5D=46&discipline_ids%5B%5D=56&discipline_ids%5B%5D=57&examining_board_ids%5B%5D=1&examining_board_ids%5B%5D=2&examining_board_ids%5B%5D=5&page=2&scholarity_ids%5B%5D=1&scholarity_ids%5B%5D=2")

while True:
    soup = BeautifulSoup(driver.page_source, 'html.parser')

    link = driver.current_url

    try:
        images = driver.find_elements(By.XPATH, '//img')
        for img in images:
            size = img.size
            w, h = size['width'], size['height']

        driver.execute_script(f'let element =  document.querySelector("#image_sec>img"); element.setAttribute("width", "{w}"); element.setAttribute("height", "{h}");')

    except NoSuchElementException:
        pass

Update: I've tried following this solution but no success. Is there a way to extract the CSS selector with Selenium?, there are 2 answers. The first one retrieves the tags, but doesn't add any attributes. The same for the second one.

MyDisplay · Accepted Answer

I've found a solution. https://www.reddit.com/r/learnpython/comments/pgp5cg/how_can_i_extract_a_css_selector_of_an_element/hbcydib/?context=3

driver.execute_script(f'arguments[0].setAttribute("width", "{w}"); arguments[0].setAttribute("height", "{h}");', img)

I'm not very familiar with Javascript so don't quote me on that. But I believe arguments[0] is passing my element as this, so it's basically telling my code to set a new attribute to my img element.

This is the working example

try:
    images = driver.find_elements(By.XPATH, './/img')

    for img in images:
        size = img.size
        w, h = size['width'], size['height']

        driver.execute_script(f'arguments[0].setAttribute("width", "{w}"); arguments[0].setAttribute("height", "{h}");', img)
except NoSuchElementException:
    pass

Is there a way to extract the CSS selector with Selenium?

Answers (2)

Related Questions