adc admin
adc admin

Reputation: 23

trying to scrape a table from web using selenium in python. AttributeError: 'NoneType' object has no attribute 'find_all'

I am trying to scrape a table from web using selenium in python. But it shows the error AttributeError: 'NoneType' object has no attribute 'find_all'. It is a form type web page and the table has no class. So how can I modify the code to scrape the data?

from selenium import webdriver 
from webdriver_manager.firefox import GeckoDriverManager
driver = webdriver.Firefox(executable_path=GeckoDriverManager().install())

from selenium.webdriver.support.ui import Select 

# Web page url 
driver.get("http://mnregaweb4.nic.in/netnrega/dynamic_work_details.aspx?page=S&lflag=eng&state_name=KERALA&state_code=16&fin_year=2020-2021&source=national&Digest=s5wXOIOkT98cNVkcwF6NQA") 
  
# Find District of option 
x = driver.find_element_by_id('ctl00_ContentPlaceHolder1_ddl_dist') 
drop = Select(x) 
  
# Select by value 
drop.select_by_value("1613") 
time.sleep(4) 

# Find Block of option 
x = driver.find_element_by_id('ctl00_ContentPlaceHolder1_ddl_blk') 
drop = Select(x) 
  
# Select by value 
drop.select_by_value("1613001") 
time.sleep(4) 

# Find GP of option 
x = driver.find_element_by_id('ctl00_ContentPlaceHolder1_ddl_pan') 
drop = Select(x) 
  
# Select by value 
drop.select_by_value("1613001001") 
time.sleep(4) 


search_button = driver.find_element_by_id("ctl00_ContentPlaceHolder1_Button1")
search_button.click()

from bs4 import BeautifulSoup
driver.page_source
doc = BeautifulSoup(driver.page_source, "html.parser")
rows = doc.find('table', border='.5').find_all('td', attrs={'class': None})

works = []
print(works)```

Upvotes: 0

Views: 334

Answers (2)

AmineBTG
AmineBTG

Reputation: 697

Try read_html pandas function by passing the source code once table is loaded:

time.sleep(5) 

# from bs4 import BeautifulSoup
# doc = BeautifulSoup(driver.page_source, "html.parser")
# rows = doc.find('table', {"border":'.5'}).find_all('td', attrs={'class': None})
# print(rows[:10])

import pandas as pd
df = pd.read_html(driver.page_source)

Upvotes: 1

AmineBTG
AmineBTG

Reputation: 697

works fine with below adjustments : 1- Add sleep time of 5 sec before trying to read table to give it some time to load. 2- Passing the parameter as dictionnary in doc.find function

time.sleep(5) 

from bs4 import BeautifulSoup
doc = BeautifulSoup(driver.page_source, "html.parser")
rows = doc.find('table', {"border":'.5'}).find_all('td', attrs={'class': None})

print(rows[:10])

Output:

[<td align="center" width="2%">
<b>SNo.</b>
</td>, <td align="center">
<b>District Name</b>
</td>, <td align="center">
<b>Block Name</b>
</td>, <td align="center">
<b>Panchayat Name</b>
</td>, <td align="center">
<b>Work Start Fin Year</b>
</td>, <td align="center">
<b>Work Status</b>
</td>, <td align="center">
<b>Work Code</b>
</td>, <td align="center">
<b>Work Name</b>
</td>, <td align="center">
<b>Master Work Category Name </b>
</td>, <td align="center">
<b>Work Category Name </b>
</td>]

Upvotes: 1

Related Questions