Kevin Ilondo
Kevin Ilondo

Reputation: 35

Beautiful Soup only scrapes data after running the script multiple times

I am learning how to scrape data from a website using Python and Beautiful Soup, what I'm trying to achieve with the code below is to get data from a South African e-commerce website and and save important information such as the image url, price, etc in a array. The code works well but the problem is it's too inconsistent, I have to run the script 3 to 4 times before having the expected result. When I run the script for the first time, most of the time I get an empty array.

I would appreciate if someone can tell me what went wrong in my code or what might cause the issue

def get_items(url):

    driver.get(url)
    results = []
    content = driver.page_source
    soup = BeautifulSoup(content, features="lxml")
    driver.quit()

    for element in soup.find_all(attrs="product-card"):
        
        image = element.find('img')
        link = element.find('a', {'class': 'product-link'})
        description_element = element.find('h4', {'class': 'product-desc'})
        description = description_element.string
        
        price = element.findAll('span', {'class': 'product-price'})
        
        results.append({
            'image': image['src'],
            'description': description.strip(),
            'link': link['href'],
            'price': price[0].string,
            
        })

    print(results)

Upvotes: 2

Views: 197

Answers (1)

Themis
Themis

Reputation: 569

It seems like this may be a classic "race condition". I think what's happening here is you're getting the html code from selenium before it's done executing the js code and finishing rendering the whole page - thus the page loading fully is racing against the code reaching the driver.page_source line. Try importing time and adding a time.sleep(4) to sleep for 4 seconds right after calling the driver.get(url).

If this works for you (and it might not and I could be totally wrong since I don't know what page your scraping so I couldn't verify my answer for myself), you may want to look into using the WebDriverWait function. It allows you to wait until an element on the page is found so that you know the page is fully loaded by the time you start looking for information on it.

Upvotes: 1

Related Questions