python bs4 scrape table gets wrong results

Question

I am trying to scrape this site : http://stcw.marina.gov.ph/find/?c_n=14-111112&opt=stcw and get the table at the bottom. When I try to scrape it, I get some elements of the first row, but nothing from the rest of the table. Here is my code

urlText = "http://stcw.marina.gov.ph/find/?c_n=14-111112&opt=stcw"
url = urlopen(urlText)
soup = bs.BeautifulSoup(url,"html.parser")
certificates = soup.find('table',class_='table table-bordered')
for row in certificates.find_all('tr'):
    for td in row.find_all('td'):
        print td.text

What I get as an output is:

22-20353

                                SHIP SECURITY OFFICER

Rather than the whole table. What am I missing ?

alecxe · Accepted Answer

It is yet another case of when an underlying parser makes a difference. Switch to lxml or html5lib to see the complete table parsed:

soup = bs.BeautifulSoup(url, "lxml")
soup = bs.BeautifulSoup(url, "html5lib")

python bs4 scrape table gets wrong results

Answers (1)

Related Questions