Selenium find elements in loop using find_element without increasing iteration time

Question

So my problem is that my current program scrapes 1000 games on steam (including title, review, author, etc....) this takes 19 minutes (1140 seconds) for 1000 reviews. However, for 100 reviews it takes 11.5 seconds. My goal is to make it take 115 seconds for 1000 reviews so each iteration takes the same amount of time (about 0.1 seconds per iteration). My current code is listed below.

for y in range(100):  # 200 best time is 32 seconds/ 2000 is 19 mins 7 sec
    container = browser.find_element(By.ID, "search_resultsRows")
    urls_needed = container.find_elements_by_xpath("./child::*")[y]

    #links.append(urls_needed[y])

    game_title = browser.find_elements_by_class_name("title")[y].text
    release_date = browser.find_elements_by_css_selector(
        "div.col.search_released.responsive_secondrow"
    )[y].text
    discount = browser.find_elements_by_css_selector(
        "div.col.search_discount.responsive_secondrow"
    )[y].text
    price = browser.find_elements_by_css_selector(
        "div.col.search_price.responsive_secondrow"
    )[y].text

    game_writer.writerow(
        {
            "Title": game_title,
            "Release Date": release_date,
            "Discount": discount,
            "Price": price,
            "URL": urls_needed.get_attribute("href"),
        }
    )
    if y < 100:
        browser.execute_script("window.scrollBy(0, 50);")

The problem is that I use find_elements so that it doesn't scrape the same game 1000 times. I need to be able to use find_element in a loop to scrape all the games so that is no longer increasing the list size of find_elements but also so it gets the second and third and so on the game in that list. The link to the page I'm scraping is https://store.steampowered.com/search/?filter=topsellers

EDIT: Beautiful Soup does not work to my knowledge since I need to scroll down the page to load all of the content. Page loads about 50 games at a time and needs to scroll to the bottom to load more each time.

DMart · Accepted Answer

you're searching the DOM too much. You only need to do ONE find_elements: when you're done loading content. Do your scrolling and THEN do the find_elements.

Otherwise you're searching the DOM each time (with repeat elements) and exponentially increasing the search time across the DOM each time.

But really you could just do this with requests and access the paging URL: https://store.steampowered.com/search/results/?query&start=50&count=50&dynamic_data=&sort_by=_ASC&snr=1_7_7_7000_7&filter=topsellers&infinite=1

This result tells you how many total result there are and provides you HTML you can scrape with Beautiful soup. this eliminates the UI/Browser and will increase your speed.

Selenium find elements in loop using find_element without increasing iteration time

Answers (2)

Related Questions