dergky
dergky

Reputation: 105

For Loops while using selenium for webscraping Python

I am attempting to web-scrape info off of the following website: https://www.axial.net/forum/companies/united-states-family-offices/

I am trying to scrape the description for each family office, so "https://www.axial.net/forum/companies/united-states-family-offices/"+insert_company_name" are the pages I need to scrape.

So I wrote the following code to test the program for just one page:

from bs4 import BeautifulSoup as soup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome('insert_path_here/chromedriver')
driver.get("https://network.axial.net/company/ansaco-llp")
page_source = driver.page_source
soup2 = soup(page_source,"html.parser")
soup2.findAll('axl-teaser-description')[0].text

This works for the single page, as long as the description doesn't have a "show full description" drop down button. I will save that for another question.

I wrote the following loop:

#Note: Lst2 has all the names for the companies. I made sure they match the webpage
lst3=[]
for key in lst2[1:]:
    driver.get("https://network.axial.net/company/"+key.lower())
    page_source = driver.page_source


    for handle in driver.window_handles:
         driver.switch_to.window(handle)
    word_soup = soup(page_source,"html.parser")



    if word_soup.findAll('axl-teaser-description') == []:
        lst3.append('null')
    else:
        c = word_soup.findAll('axl-teaser-description')[0].text
        lst3.append(c)
print(lst3)

When I run the loop, all of the values come out as "null", even the ones without "click for full description" buttons.

I edited the loop to instead print out "word_soup", and the page is different then if I had run it without a loop and does not have the description text.

I don't understand why a loop would cause that but apparently it does. Does anyone know how to fix this problem?

Upvotes: 1

Views: 87

Answers (2)

dergky
dergky

Reputation: 105

Found solution. pause the program for 3 seconds after driver.get:

import time
lst3=[]
for key in lst2[1:]:
    driver.get("https://network.axial.net/company/"+key.lower())
    time.sleep(3)
    page_source = driver.page_source



    word_soup = soup(page_source,"html.parser")



    if word_soup.findAll('axl-teaser-description') == []:
        lst3.append('null')
    else:
        c = word_soup.findAll('axl-teaser-description')[0].text
        lst3.append(c)
print(lst3)

Upvotes: 1

Alarm-1202
Alarm-1202

Reputation: 120

I see that the page uses javascript to generate the text meaning it doesn't show up in the page source, which is weird but ok. I don't quite understand why you're only iterating through and switching to all the instances of Selenium you have open, but you definitely won't find the description in the page source / beautifulsoup.

Honestly, I'd personally look for a better website if you can, otherwise, you'll have to try it with selenium which is inefficient and horrible.

Upvotes: 0

Related Questions