thatleviathan
thatleviathan

Reputation: 41

BeautifulSoup attrs returning list instead of dictionary

I'm trying to parse some HTML that I've scraped and running into an odd issue. I need to find a <td> tag that contains an <a> tag with a certain name, and then I want to dump the contents of the entire <td> tag. For now I'm just trying to get it to actually print the contents of the "name" attribute of the <a> tag. My understanding is that if I have a specific element (as opposed to a list of elements), the "attrs" of that element should be a dictionary, and I should be able to pull out the value via string key:

soup = BeautifulSoup(html)                                                                                                                                                                                                                
for tdblock in soup.findAll('td'):                                                                                                                                                                                                        
    try:                                                                                                                                                                                                                                  
        for ablock in tdblock.findAll('a'):                                                                                                                                                                                               
            print ablock.attrs['name']
    except AttributeError:                                                                                                                                                                                                                
        pass

(The try/except blocks are because not all the <td> blocks in the HTML have <a> blocks.)

But it throws a TypeError:

Traceback (most recent call last):
  File "fetch_historic_nfl_odds.py", line 26, in <module>
    print ablock.attrs['name']
TypeError: list indices must be integers, not str

And if I modify the code to just print ablock.attrs, it's clearly a list, not a dictionary:

[(u'name', u'EMAIL')]

I've seen some stuff on stackoverflow indicating that you'll get a list if you try to parse the attributes of a findAll, but I'm going element by element, so it's unclear why that would be the case.

I've also tried modifying things so it uses find() to just get the first A item, but "attrs" is still a list.

Grabbing what I need by integer works, but I can't rely on the data I need always being at the same spot in the list. I know that I can use findAll to search for specific elements by the actual attribute, but I need to match only the first few words of the string in the name attribute, so I don't think that would work.

EDIT: Here's a snippet of the HTML code I'm trying to parse, via soup.prettify():

<table width="644" border="0" cellpadding="3" cellspacing="0">
 <tr>
  <td>
   <br />
   <a name="Closing NFL Odds Week 1, 2006">
   </a>
   <center>
    <font face="Georgia, Times New Roman, Times, serif">
     <span style="font-size:14.0pt;font-family:Georgia">
      <b>
       Closing Las Vegas NFL Odds From Week 1, 2006
       <br />
       Week One NFL Football Odds
       <br />
       Pro Football Game Odds 9/7 - 9/11, 2006
      </b>
     </span>
    </font>
   </center>

What I'm looking for is to be able to check and see if that first <a> tag has a "name" field that starts with "Closing NFL Odds", and if it does, return the whole <td> block for additional parsing.

Further Edit: I'm using Python 2.7.12, and the non-bs4 BeautifulSoup, in case that's relevant.

Upvotes: 2

Views: 679

Answers (1)

thatleviathan
thatleviathan

Reputation: 41

jwodder had it right; BeautifulSoup versions prior to version 4 seem to return lists for the attributes. I upgraded to bs4 and now it works. Thanks, all!

Upvotes: 1

Related Questions