Reputation: 278
I am trying to parse the data in this website: http://www.baseball-reference.com/boxes/CHN/CHN201606020.shtml
I want to extract some of the data in the tables. But for some reason, I am struggling to find them. For example, what I want to do is this
from bs4 import BeautifulSoup
import requests
url = 'http://www.baseball-reference.com/boxes/CHN/CHN201606020.shtml'
soup = BeautifulSoup(requests.get(url).text)
soup.find('table', id='ChicagoCubsbatting')
The final line returns nothing despite a table with that id existing in the html. Furthermore, len(soup.findAll('table'))
returns 1 even though there are many tables in the page. I've tried using the 'lxml', 'html.parser' and 'html5lib'. All behave the same way.
What is going on? Why does this not work and what can I do to extract the table?
Upvotes: 0
Views: 382
Reputation: 12158
use soup.find('div', class_='placeholder').next_sibling.next_sibling
to get the comment text, then build a new soup
using those text.
In [35]: new_soup = BeautifulSoup(text, 'lxml')
In [36]: new_soup.table
Out[36]:
<table class="teams poptip" data-tip="San Francisco Giants at Atlanta Braves">
<tbody>
<tr class="winner">
<td><a href="/teams/SFG/2016.shtml">SFG</a></td>
<td class="right">6</td>
<td class="right gamelink">
<a href="/boxes/ATL/ATL201606020.shtml">Final</a>
</td>
</tr>
<tr class="loser">
<td><a href="/teams/ATL/2016.shtml">ATL</a></td>
<td class="right">0</td>
<td class="right">
</td>
</tr>
</tbody>
</table
Upvotes: 1