Reputation: 1219
I am trying to parse a table and save it into a csv file. However, some of the cells are images (*.gif) of a checkmark and I am unsure how to evaluate when exporting to csv.
here is some html code:
<BODY>
<TABLE>
<TH>
<H3> <BR>TABLE 1 </H3>
</TH>
<TR>
<TD>Data 1 </TD>
<TD>Data 2 </TD>
</TR>
<TR>
<TD>example.gif </TD>
<TD>example.gif </TD>
</TR>
</TABLE>
</BODY>
In the actual table, the html for the table row that includes the .gif is
<td align="center" width="55px">
<!--
-->
<img align="top" height="13" hspace="2" src="http://explorer.natureserve.org/images/checkmark.gif" vspace="2" width="14"/>
<!--
-->
</td>
The code I have so far is:
table = soup.find('table')
rows = []
for row in table.find_all('tr'):
rows.append([val.text.encode('utf8') for val in row.find_all('td')])
In the example provide, the code I have evaluates to:
[
'Spartina patens',
'G5',
'Graminoid',
'Herb (field)',
'\n\r\n \xc2\xa0\r\n\n',
'\n\n\n\n',
'\n\r\n \xc2\xa0\r\n\n',
'\xc2\xa0',
'\xc2\xa0'
]
I am guessing that if the cell doesn't include '\xc2\x', then I could evaluate to a 1, but not sure how to do this. Any help would be appreciated.
What I would like to do is place a 1 in the appropriate row and column if the image is present and a 0 otherwise.
Upvotes: 1
Views: 593
Reputation: 473873
Check if there is img
for every td
in the loop:
for row in table.find_all('tr'):
rows.append([1 if val.img else 0 for val in row.find_all('td')])
Or a bit trickier:
[int(val.img is not None) for val in row.find_all('td')]
where val.img
is a shortcut to val.find('img')
.
Upvotes: 1