Reputation: 79
I am trying to gather information from a website that has a database for ships.
I was trying to get the information with BeautifulSoup. But at the moment it does not seem to be working. I tried searching the web and tried different solutions, but did not manage to get the code working.
I was wondering to I have to change
table = soup.find_all("table", { "class" : "table1" })
--- line as there are 5 tables with class='table1'
, but my code only finds 1.
Do I have to create a loop for the tables? As I tried this I could not get it working. Also the next line table_body = table.find('tbody')
it gives an error:
AttributeError: 'ResultSet' object has no attribute 'find'
This should be the conflict between BeautifulSoup's source code, that ResultSet subclasses list and my code. Do I have to iterate over that list?
from urllib import urlopen
shipUrl = 'http://www.veristar.com/portal/veristarinfo/generalinfo/registers/seaGoingShips?portal:componentId=p_efff31ac-af4c-4e89-83bc-55e6d477d131&interactionstate=JBPNS_rO0ABXdRAAZudW1iZXIAAAABAAYwODkxME0AFGphdmF4LnBvcnRsZXQuYWN0aW9uAAAAAQAYc2hpcFNlYXJjaFJlc3VsdHNTZXRTaGlwAAdfX0VPRl9f&portal:type=action&portal:isSecure=false'
shipPage = urlopen(shipUrl)
from bs4 import BeautifulSoup
soup = BeautifulSoup(shipPage)
table = soup.find_all("table", { "class" : "table1" })
print table
table_body = table.find('tbody')
rows = table_body.find_all('tr')
for tr in rows:
cols = tr.find_all('td')
for td in cols:
print td
print
Upvotes: 2
Views: 8889
Reputation: 9038
A couple of things:
As Kevin mentioned, you need to use a for
loop to iterate through the list returned by find_all
.
Not all of the tables have a tbody
so you have to wrap the result of the find
in a try
block.
When you do a print
you want to use the .text
method so you print the text value and not the tag itself.
Here is the revised code:
shipUrl = 'http://www.veristar.com/portal/veristarinfo/generalinfo/registers/seaGoingShips?portal:componentId=p_efff31ac-af4c-4e89-83bc-55e6d477d131&interactionstate=JBPNS_rO0ABXdRAAZudW1iZXIAAAABAAYwODkxME0AFGphdmF4LnBvcnRsZXQuYWN0aW9uAAAAAQAYc2hpcFNlYXJjaFJlc3VsdHNTZXRTaGlwAAdfX0VPRl9f&portal:type=action&portal:isSecure=false'
shipPage = urlopen(shipUrl)
soup = BeautifulSoup(shipPage)
table = soup.find_all("table", { "class" : "table1" })
for mytable in table:
table_body = mytable.find('tbody')
try:
rows = table_body.find_all('tr')
for tr in rows:
cols = tr.find_all('td')
for td in cols:
print td.text
except:
print "no tbody"
Which produces the below output:
Register Number:
08910M
IMO Number:
9365398
Ship Name:
SUPERSTAR
Call Sign:
ESIY
.....
Upvotes: 3