Reputation: 705
I'm trying to get the data from a table with a specific ID which I know. For some reason, the code keeps giving me a None result.
From the HTML code I'm trying to parse:
<table cellspacing="0" cellpadding="3" border="0" id="ctl00_SPWebPartManager1_g_c001c0d9_0cb8_4b0f_b75a_7cc3b6f7d790_ctl00_HistoryData1_gridHistoryData_DataGrid1" style="width:100%;border-collapse:collapse;">
<tr class="gridHeader" valign="top">
<td class="titleGridRegNoB" align="center" valign="top"><span dir=RTL>שווי שוק (אלפי ש"ח)</span></td>
<td class="titleGridReg" align="center" valign="top">הון רשום למסחר</td>
<td class="titleGridReg" align="center" valign="top">שער נמוך</td><td class="titleGridReg" align="center" valign="top">שער גבוה</td>
<td class="titleGridReg" align="center" valign="top">שער בסיס</td>
<td class="titleGridReg" align="center" valign="top">שער פתיחה</td><td class="titleGridReg" align="center" valign="top"><span dir="rtl">שער נעילה (באגורות)</span></td>
<td class="titleGridReg" align="center" valign="top">שער נעילה מתואם</td><td class="titleGridReg" align="center" valign="top">תאריך</td>
</tr>
<tr onmouseover="this.style.backgroundColor='#FDF1D7'" onmouseout="this.style.backgroundColor='#ffffff'">
... And so on
My code:
html = br.response().read()
soup = BeautifulSoup(html)
table = soup.find(lambda tag: tag.name=='table' and tag.has_key('id') and tag['id']=="ctl00_SPWebPartManager1_g_c001c0d9_0cb8_4b0f_b75a_7cc3b6f7d790_ctl00_HistoryData1_gridHistoryData_DataGrid1")
rows = table.findAll(lambda tag: tag.name=='tr')
In [100]: print table
None
Upvotes: 13
Views: 34520
Reputation: 28292
From the documentation:
table = soup.find('table', id="ctl00_SPWebPartManager1_g_c001c0d9_0cb8_4b0f_b75a_7cc3b6f7d790_ctl00_HistoryData1_gridHistoryData_DataGrid1")
And the for the rows line:
rows = table.findAll('tr')
For the encoding problem, try decoding it from utf-8
, and re-encode it.
html = br.response().read().decode('utf-8')
soup = BeautifulSoup(html.encode('utf-8'))
Upvotes: 23
Reputation: 82550
Improving upon aiKid's answer:
# coding=utf-8
from bs4 import BeautifulSoup
html = u"""
<table cellspacing="0" cellpadding="3" border="0" id="ctl00_SPWebPartManager1_g_c001c0d9_0cb8_4b0f_b75a_7cc3b6f7d790_ctl00_HistoryData1_gridHistoryData_DataGrid1" style="width:100%;border-collapse:collapse;">
<tr class="gridHeader" valign="top">
<td class="titleGridRegNoB" align="center" valign="top"><span dir=RTL>שווי שוק (אלפי ש"ח)</span></td><td class="titleGridReg" align="center" valign="top">הון רשום למסחר</td><td class="titleGridReg" align="center" valign="top">שער נמוך</td><td class="titleGridReg" align="center" valign="top">שער גבוה</td><td class="titleGridReg" align="center" valign="top">שער בסיס</td><td class="titleGridReg" align="center" valign="top">שער פתיחה</td><td class="titleGridReg" align="center" valign="top"><span dir="rtl">שער נעילה (באגורות)</span>
</td><td class="titleGridReg" align="center" valign="top">שער נעילה מתואם</td><td class="titleGridReg" align="center" valign="top">תאריך</td>
</tr><tr onmouseover="this.style.backgroundColor='#FDF1D7'" onmouseout="this.style.backgroundColor='#ffffff'">
"""
soup = BeautifulSoup(html)
print soup.find_all("table",
id="ctl00_SPWebPartManager1_g_c001c0d9_0cb8_4b0f_b75a_7cc3b6f7d790_ctl00_HistoryData1_gridHistoryData_DataGrid1")
Since you're working with UTF-8 data, you need to set the string as a unicode string like so u"""(...)"""
. All you need to do to work with unicode is this:
br.response().read().decode('utf-8')
The above will give you an ASCII string, that you can later encode into unicode. Like, say the string is stored in html
, and you can encode it back to unicode using html.encode("utf-8")
. If you do this, you do not need to put the u
in front of anything. You can treat everything as a regular string again.
Upvotes: 1