user44796
user44796

Reputation: 1219

Evaluate image in html table using Python

I am trying to parse a table and save it into a csv file. However, some of the cells are images (*.gif) of a checkmark and I am unsure how to evaluate when exporting to csv.

here is some html code:

<BODY>
<TABLE>
<TH>
<H3>    <BR>TABLE 1    </H3> 
</TH>
<TR>
<TD>Data 1    </TD>
<TD>Data 2    </TD>
</TR>
<TR>
<TD>example.gif    </TD>
<TD>example.gif   </TD>
</TR>
</TABLE>
</BODY>

In the actual table, the html for the table row that includes the .gif is

<td align="center" width="55px">
<!--
-->
<img align="top" height="13" hspace="2" src="http://explorer.natureserve.org/images/checkmark.gif" vspace="2" width="14"/>
<!--
-->
</td>

The code I have so far is:

table = soup.find('table')
rows = []

for row in table.find_all('tr'):
    rows.append([val.text.encode('utf8') for val in row.find_all('td')])

In the example provide, the code I have evaluates to:

[
    'Spartina patens', 
    'G5', 
    'Graminoid',
    'Herb (field)', 
    '\n\r\n                        \xc2\xa0\r\n\n', 
    '\n\n\n\n', 
    '\n\r\n                       \xc2\xa0\r\n\n', 
    '\xc2\xa0', 
    '\xc2\xa0'
 ]

I am guessing that if the cell doesn't include '\xc2\x', then I could evaluate to a 1, but not sure how to do this. Any help would be appreciated.

What I would like to do is place a 1 in the appropriate row and column if the image is present and a 0 otherwise.

Upvotes: 1

Views: 593

Answers (1)

alecxe
alecxe

Reputation: 473873

Check if there is img for every td in the loop:

for row in table.find_all('tr'):
    rows.append([1 if val.img else 0 for val in row.find_all('td')])

Or a bit trickier:

[int(val.img is not None) for val in row.find_all('td')]

where val.img is a shortcut to val.find('img').

Upvotes: 1

Related Questions