How to return X elements [Selenium]?

Question

A page loads 35.000 elements, which only the first 10 are of interest to me. Returning all elements makes the scraping extremely slow. I only succeeded in either returning the first element with:

driver.find_element_by

Or returning all, 35.000 elements, with:

driver.find_elements_by

Anyone knows a way to return x amount of elements found?

Louis · Accepted Answer

Selenium does not provide a facility that allows returning only a slice of the .find_elements... calls. A general solution if you want to optimize things so that you do not need to have Selenium return every single element is perform the slice operation on the browser side, in JavaScript. I present this solution in this answer here. If you want to use XPath for selecting the DOM nodes, you could adapt the answer here to that, or you could use the method in another answer I've submitted.

from selenium import webdriver
driver = webdriver.Chrome()
driver.get("http://www.example.com")

# We add 35000 paragraphs with class `test` to the page so that we can
# later show how to get the first 10 paragraphs of this class. Each
# paragraph is uniquely numbered.
driver.execute_script("""
var html = [];
for (var i = 0; i < 35000; ++i) {
  html.push(""+ i + "");
}
document.body.innerHTML += html.join("");
""")

elements = driver.execute_script("""
return Array.prototype.slice.call(document.querySelectorAll("p.test"), 0, 10);
""")

# Verify that we got the first 10 elements by outputting the text they
# contain to the console. The loop here is for illustration purposes
# to show that the `elements` array contains what we want. In real
# code, if I wanted to process the text of the first 10 elements, I'd
# do what I show next.
for element in elements:
    print element.text

# A better way to get the text of the first 10 elements. This results
# in 1 round-trip between this script and the browser. The loop above
# would take 10 round-trips.
print driver.execute_script("""
return Array.prototype.slice.call(document.querySelectorAll("p.test"), 0, 10)
           .map(function (x) { return x.textContent; });;
""")

driver.quit()

The Array.prototype.slice.call rigmarole is needed because what document.querySelectorAll returns looks like an Array but is not actually an Array object. (It is a NodeList.) So it does not have a .slice method but you can pass it to Array's slice method.

How to return X elements [Selenium]?

Answers (2)

Related Questions