NullUserException
NullUserException

Reputation: 85468

Problem accessing attributes in BeautifulSoup

I am having problems using Python (2.7). The code basically consists of:

str = '<el at="some">ABC</el><el>DEF</el>'
z = BeautifulStoneSoup(str)

for x in z.findAll('el'):
    # if 'at' in x:
    # if hasattr(x, 'at'):
        print x['at']   
    else:
        print 'nothing'

I expected the first if statement to work correctly (ie: if at doesn't exist, print "nothing"), but it always prints nothing (ie: is always False). The second if on the other hand is always True, which will cause the code to raise a KeyError when trying to access at from the second <el> element, which of course doesn't exist.

Upvotes: 6

Views: 5588

Answers (4)

Eli Bendersky
Eli Bendersky

Reputation: 273516

The in operator is for sequence and mapping types, what makes you think the object returned by BeautifulSoup is supposed to implement it correctly? According to the BeautifulSoup docs, you should access attributes using the [] syntax.

Re hasattr, I think you confused HTML/XML attributes and Python object attributes. hasattr is for the latter, and BeaitufulSoup AFAIK doesn't reflect the HTML/XML attributes it parsed in its own object attributes.

P.S. note that the Tag object in BeautifulSoup does implement __contains__ - so maybe you're trying with the wrong object? Can you show a complete but minimal example that demonstrates the problem?


Running this:

from BeautifulSoup import BeautifulSoup

str = '<el at="some">ABC</el><el>DEF</el>'
z = BeautifulSoup(str)

for x in z.findAll('el'):
    print type(x)
    print x['at']

I get:

<class 'BeautifulSoup.Tag'>
some
<class 'BeautifulSoup.Tag'>
Traceback (most recent call last):
  File "soup4.py", line 8, in <module>
    print x['at']
  File "C:\Python26\lib\site-packages\BeautifulSoup.py", line 601, in __getitem__
    return self._getAttrMap()[key]
KeyError: 'at'

Which is what I expected. The first el has a at attribute, the second doesn't - and this throws a KeyError.


Update 2: the BeautifulSoup.Tag.__contains__ looks inside the contents of the tag, not its attributes. To check if an attribute exists use in.

Upvotes: 7

PaulMcG
PaulMcG

Reputation: 63739

To just scan for an element by tag name, a pyparsing solution might be more readable (and without using deprecated API's like has_key):

from pyparsing import makeXMLTags

# makeXMLTags creates a pyparsing expression that matches tags with
# variations in whitespace, attributes, etc.
el,elEnd = makeXMLTags('el')

# scan the input text and work with elTags
for elTag, tagstart, tagend in el.scanString(xmltext):
    if elTag.at:
        print elTag.at

For an added refinement, pyparsing allows you to define a filtering parse action so that tags will only match if a particular attribute-value (or attribute-anyvalue) is found:

# import parse action that will filter by attribute
from pyparsing import withAttribute

# only match el tags having the 'at' attribute, with any value
el.setParseAction(withAttribute(at=withAttribute.ANY_VALUE))

# now loop again, but no need to test for presence of 'at'
# attribute - there will be no match if 'at' is not present
for elTag, tagstart, tagend in el.scanString(xmltext):
    print elTag.at

Upvotes: 1

Dzinx
Dzinx

Reputation: 57804

If your code is as simple as you provided, you can solve it in a compact way with:

for x in z.findAll('el'):
    print x.get('at', 'nothing')

Upvotes: 1

user2665694
user2665694

Reputation:

I usually use the get() method for accessing attribute

link = soup.find('a')
href = link.get('href')
name = link.get('name')

if name:
    print 'anchor'
if href:
    print 'link'

Upvotes: 0

Related Questions