Python selenium webdriver code performance

Question

I am scraping a webpage using Selenium in Python. I am able to locate the elements using this code:

from selenium import webdriver
import codecs

driver = webdriver.Chrome()
driver.get("url")
results_table=driver.find_elements_by_xpath('//*[@id="content"]/table[1]/tbody/tr')

Each element in results_table is in turn a set of sub-elements, with the number of sub-elements varying from element to element. My goal is to output each element, as a list or as a delimited string, into an output file. My code so far is this:

results_file=codecs.open(path+"results.txt","w","cp1252")

for element in enumerate(results_table):
    element_fields=element.find_elements_by_xpath(".//*[text()][count(*)=0]")
    element_list=[field.text for field in element_fields]
    stuff_to_write='#'.join(element_list)+"
"
    results_file.write(stuff_to_write)
    #print (i)
results_file.close()
driver.quit()

This second part of code takes about 2.5 minutes on a list of ~400 elements, each with about 10 sub-elements. I get the desired output, but it is too slow. What could I do to improve the prformance ?

Using python 3.6

GaryMBloom · Accepted Answer

Download the whole page in one shot, then use something like BeautifulSoup to process it. I haven't used splinter or selenium in a while, but in Splinter, .html will give you the page. I'm not sure what the syntax is for that in Selenium, but there should be a way to grab the whole page.

Selenium (and Splinter, which is layered on top of Selenium) are notoriously slow for randomly accessing web page content. Looks like .page_source may give the entire contents of the page in Selenium, which I found at stackoverflow.com/questions/35486374/…. If reading all the chunks on the page one at a time is killing your performance (and it probably is), reading the whole page once and processing it offline will be oodles faster.

Python selenium webdriver code performance

Answers (1)

Related Questions