Reputation: 841
I am attempting a simple scrape of an HTML table using BeautifulSoup with the following:
import urllib
import urllib.request
from bs4 import BeautifulSoup
def make_soup(url):
page = urllib.request.urlopen(url)
sdata = BeautifulSoup(page, 'html.parser')
return sdata
url = 'http://www.satp.org/satporgtp/countries/pakistan/database/bombblast.htm'
soup = make_soup(url)
table = soup.findAll('table', attrs={'class':'pagraph1'})
table = table[0]
trows = table.findAll('tr')
bbdata_ = []
bbdata = []
for trow in trows:
bbdata_ = trow.findAll('td')
bbdata = [ele.text.strip() for ele in bbdata_]
print(bbdata)
However, I can only extract the last row in the table, i.e.
['Total*', '369', '1032+']
All of the data is included in the trows
, so I must be forming my loop incorrectly, but I am not sure how.
Upvotes: 0
Views: 925
Reputation: 166
Your problem is here:
bbdata = [ele.text.strip() for ele in bbdata_]
You want to append to the list or extend it:
bbdata.append([ele.text.strip() for ele in bbdata_])
You are overwriting bbdata each time through the loop which is why it ends up only with the final value.
Upvotes: 2