ronaldo
ronaldo

Reputation: 105

Not able to Scrape data using BeautifulSoup

I'm using Selenium to login to the webpage and getting the webpage for scraping I'm able to get the page. I have searched the html for a table that I wanted to scrape. here it is:-

<table cellspacing="0" class=" tablehasmenu table hoverable sensors" id="table_devicesensortable">

This is the script :-

rawpage=driver.page_source #storing the webpage in variable
souppage=BeautifulSoup(rawpage,'html.parser') #parsing the webpage
tbody=souppage.find('table', attrs={'id':'table_devicesensortable'}) #scrapping

I'm able to get the parsed webpage in souppage variable. but not able to scrape and store in tbody variable.

Upvotes: 1

Views: 983

Answers (3)

undetected Selenium
undetected Selenium

Reputation: 193058

As per the HTML you have shared to scrape the <table> you have induce WebDriverWait with expected_conditions clause set to presence_of_element_located and to achieve that you can use either of the following code blocks :

  • Using class:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, "//table[@class=' tablehasmenu table hoverable sensors' and @id='table_devicesensortable']")))
    rawpage=driver.page_source #storing the webpage in variable
    souppage=BeautifulSoup(rawpage,"html.parser") #parsing the webpage
    tbody=souppage.find("table",{"class":" tablehasmenu table hoverable sensors"}) #scrapping
    
  • Using id:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, "//table[@class=' tablehasmenu table hoverable sensors' and @id='table_devicesensortable']")))
    rawpage=driver.page_source #storing the webpage in variable
    souppage=BeautifulSoup(rawpage,"html.parser") #parsing the webpage
    tbody=souppage.find("table",{"id":"table_devicesensortable"}) #scrapping
    

Upvotes: 0

Andersson
Andersson

Reputation: 52665

Required table might be generated dynamically, so you need to wait until its presence on page:

from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait

tbody = wait(driver, 10).until(EC.presence_of_element_located((By.ID, "table_devicesensortable")))

Also note that there is no need in using BeautifulSoup as Selenium has enough built-in methods and properties to do the same job for you, e.g.

headers = tbody.find_elements_by_tag_name("th")
rows = tbody.find_elements_by_tag_name("tr")
cells = tbody.find_elements_by_tag_name("td")
cell_values = [cell.text for cell in cells]
etc...

Upvotes: 3

ronaldo
ronaldo

Reputation: 105

I was searching on stackoverflow for the issue and came across this post

BeautifulSoup returning none when element definitely exists

By reading the answer provided by luiyezheng i got the hint that might be as the data is fetched dynamically.So, the table might got created dynamically and hence i was unable to find.

So, the work around is :-

before storing the webpage i put a delay

so the code goes like this

time.sleep(4)
rawpage=driver.page_source #storing the webpage in variable
souppage=BeautifulSoup(rawpage,"html.parser") #parsing the webpage
tbody=souppage.find("table",{"id":"table_devicesensortable"}) #scrapping

i hope it might help someone.

Upvotes: 0

Related Questions