Reputation: 107
I am scraping this page("http://mahaprantikssksamaj.com/ssk-samaj-maharashtras.aspx"). I am storing the valid urls and request to redirect to next page and scrape the data of next page for each valid urls.
The data of page is stored in table and I am getting this error : ""AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()? "" My code is here :
from bs4 import BeautifulSoup
import requests
r = requests.get('http://mahaprantikssksamaj.com/ssk-samaj-maharashtras.aspx')
soup = BeautifulSoup(r.text, 'html.parser')
for i in range(36):
print(i)
url = 'http://mahaprantikssksamaj.com/ssk-prantik-members.aspx?id={}'.format(i)
r = requests.get(url)
web = BeautifulSoup(r.content,"html.parser")
table= web.findAll("table",id="DGORG")
print(table)
table_body = table.find('tbody')
rows = table_body.find_all('tr')
for tr in rows:
cols = tr.find_all('td')
for td in cols:
print (td)
print(table) is giving o/p this:
<div class="memcss">
<table border="1" style="width:90%;padding:10px;margin:0px 0px 20px
20px;box-shadow:2px 2px 2px #000000">
<tr>
<td colspan="2" style="text-align:center"><h5>Mr. Jaydeo Mahadeosa
Pawar</h5></td>
</tr>
<tr>
<td colspan="2" style="text-align:center"><h6>Secretory</h6></td>
</tr>
<tr>
<td style="width:25%;height:30px;text-align:right">Address : </td>
<td> Pune</td>
</tr>
<tr>
<td style="width:20%;height:30px;text-align:right">City : </td>
<td> Pune</td>
</tr>
<tr>
<td style="width:20%;height:30px;text-align:right">Mobile : </td>
<td> </td>
</tr>
</table>
</div>
</td>
</tr><tr>
<td>
Trying to store only name,designation,address and mobile number in csv file. Can anyone please help where I am wrong.Thanks in advance.
Upvotes: 0
Views: 93
Reputation: 22440
To get all the content from each of the tables connected to view members
links in the landing page, you can comply with the following approach:
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import requests
link = "http://mahaprantikssksamaj.com/ssk-samaj-maharashtras.aspx"
res = requests.get(link)
soup = BeautifulSoup(res.text, 'html.parser')
for item in soup.select("a[style$='text-decoration:none']"):
req = requests.get(urljoin(link,item.get("href")))
sauce = BeautifulSoup(req.text,"html.parser")
for elem in sauce.select(".memcss table tr"):
data = [item.get_text(strip=True) for item in elem.select("td")]
print(data)
Output are like:
['Shri. Narsinhasa Narayansa Kolhapure']
['Chairman']
['Address :', 'Ahamadnagar']
['City :', 'Ahamadnagar']
['Mobile :', '2425577']
Upvotes: 1
Reputation: 834
from bs4 import BeautifulSoup
import requests
r = requests.get('http://mahaprantikssksamaj.com/ssk-samaj-maharashtras.aspx')
soup = BeautifulSoup(r.text, 'html.parser')
for i in range(36):
print(i)
url = 'http://mahaprantikssksamaj.com/ssk-prantik-members.aspx?id={}'.format(i)
r = requests.get(url)
web = BeautifulSoup(r.content, "html.parser")
table = web.find("table", id="DGORG")
print(table)
rows = table.find_all('tr')
for tr in rows:
cols = tr.find_all('td')
for td in cols:
print(td)
Changes
Use table= web.findAll("table",id="DGORG")
use find
insted of findAll
and when we inspect the website it shows table
has tbody
. But it may not available in source code. To confirm that go to the view page source
.
how to get tbody from table from python beautiful soup ?
Upvotes: 0