claws
claws

Reputation: 54100

Why do I get None when text is available in the table column?

I'm doing webscraping using Beautiful Soup. I'm new to it.

Question 1: Here is the Table:

<table width="75%" align=center>
    <tr>
        <td><STRONG><font face="Arial" size=2>S.No:</font></STRONG></td>
        <td><font face="Arial" size=2> 1635925</font></td>
    </tr>
    <tr>
        <td><FONT size=2><STRONG><font face="Arial">Name:</font><br></STRONG></FONT></td>
        <td><font face="Arial" size=2> <b>Alex</b></font></td>
    </tr>
    <tr>
        <td><STRONG><font face="Arial" size=2>Dog's Name:</font></STRONG></td>
        <td><font face="Arial" size=2> Tiger</font></td>
    </tr>
    <tr>
        <td><STRONG><font face="Arial" size=2 >Cat's Name:</font></STRONG></td>
        <td><font face="Arial" size=2>Pussy</font></td>
    </tr>
</table>

Here is code referring to above table:

for row in soup('table')[4]('tr'):
  tds = row('td')
  print tds[0].string, tds[1].string

Here is output:

S.No:  1635925
None None
Dog's Name:  Tiger
Cat's Name: Pussy

problem is row 2, Why both of the columns printed None ?

Question 2: Similar problem as above

  <tr bgcolor="#ffffff">
    <td align="middle"><font face="Arial" size=2>503</font></td>
    <td align="left"><font face="Arial" size=2>Text1</font></td>
    <td align="left"><font face="Arial" size=2>---</font></td>
    <td align="middle"><font face="Arial" size=2>2</font></td>
  </tr>  

   <tr bgcolor="#e6e6fa">
          <td colspan=4><font face="Arial" size=2>&nbsp;&nbsp;some random text</font></td>
   </tr>
   <tr >
    <td align="middle"><font face="Arial" size=2>048</font> </td>
    <td align="left"><font face="Arial" size=2>Text 2</font></td>
    <td align="left"><font face="Arial" size=2>187 &nbsp;&nbsp;&nbsp;&nbsp;</font></td>
    <td align="middle"><font face="Arial" size=2>2</font></td>
  </tr>

my code:

for row in soup('table')[5]('tr'):
    tds = row('td');
    if len(tds) == 4:
        print tds[0].string, tds[1].string, tds[2].string, tds[3].string

output:

503 Text1 --- 2
None Text2 187     2

Why is the text of first column None and not 048?

Upvotes: 1

Views: 135

Answers (2)

abarnert
abarnert

Reputation: 365717

The problem is that the second row's td elements don't contain a single element with string contents; they contain two of them. So, string doesn't have an unambiguous value, and therefore returns None.

You can see this if you break it down into pieces:

>>> table = s('table')[4]
>>> row = table('tr')[1]
>>> col = row('td')[0]
>>> font = col('font')[0]
>>> strong = font('strong')[0]
>>> font2 = strong('font')[0]
>>> strong
<strong><font face="Arial">Name:</font><br/></strong>
>>> strong.string
>>> font2
<font face="Arial">Name:</font>
>>> font2.string
u'Name:'

If you want the textual representation of all of the strings within an element, use text instead of string:

>>> strong.text
u'Name:'
>>> font.text
u'Name:'
>>> col.text
u'Name:'

Upvotes: 1

alecxe
alecxe

Reputation: 473873

Give a try to text instead of string. E.g.:

for row in soup('table')[4]('tr'):
  tds = row('td')
  print tds[0].text, tds[1].text

prints:

S.No:  1635925
Name:  Alex
Dog's Name:  Tiger
Cat's Name: Pussy

According to docs, string becomes None if element has multiple childrens:

For your convenience, if a tag has only one child node, and that child node is a string, the child node is made available as tag.string, as well as tag.contents[0].

Upvotes: 1

Related Questions