dillon
dillon

Reputation: 278

Parsing html in with BeautifulSoup fails to find a table

I am trying to parse the data in this website: http://www.baseball-reference.com/boxes/CHN/CHN201606020.shtml

I want to extract some of the data in the tables. But for some reason, I am struggling to find them. For example, what I want to do is this

from bs4 import BeautifulSoup
import requests

url = 'http://www.baseball-reference.com/boxes/CHN/CHN201606020.shtml'
soup = BeautifulSoup(requests.get(url).text)
soup.find('table', id='ChicagoCubsbatting')

The final line returns nothing despite a table with that id existing in the html. Furthermore, len(soup.findAll('table')) returns 1 even though there are many tables in the page. I've tried using the 'lxml', 'html.parser' and 'html5lib'. All behave the same way.

What is going on? Why does this not work and what can I do to extract the table?

Upvotes: 0

Views: 382

Answers (1)

宏杰李
宏杰李

Reputation: 12158

use soup.find('div', class_='placeholder').next_sibling.next_sibling to get the comment text, then build a new soup using those text.

In [35]: new_soup = BeautifulSoup(text, 'lxml')

In [36]: new_soup.table
Out[36]: 
<table class="teams poptip" data-tip="San Francisco Giants at Atlanta Braves">
<tbody>
<tr class="winner">
<td><a href="/teams/SFG/2016.shtml">SFG</a></td>
<td class="right">6</td>
<td class="right gamelink">
<a href="/boxes/ATL/ATL201606020.shtml">Final</a>
</td>
</tr>
<tr class="loser">
<td><a href="/teams/ATL/2016.shtml">ATL</a></td>
<td class="right">0</td>
<td class="right">
</td>
</tr>
</tbody>
</table

Upvotes: 1

Related Questions