Reputation: 85468
I am having problems using Python (2.7). The code basically consists of:
str = '<el at="some">ABC</el><el>DEF</el>'
z = BeautifulStoneSoup(str)
for x in z.findAll('el'):
# if 'at' in x:
# if hasattr(x, 'at'):
print x['at']
else:
print 'nothing'
I expected the first if
statement to work correctly (ie: if at
doesn't exist, print "nothing"
), but it always prints nothing (ie: is always False
). The second if
on the other hand is always True
, which will cause the code to raise a KeyError
when trying to access at
from the second <el>
element, which of course doesn't exist.
Upvotes: 6
Views: 5588
Reputation: 273516
The in
operator is for sequence and mapping types, what makes you think the object returned by BeautifulSoup
is supposed to implement it correctly? According to the BeautifulSoup docs, you should access attributes using the []
syntax.
Re hasattr
, I think you confused HTML/XML attributes and Python object attributes. hasattr
is for the latter, and BeaitufulSoup AFAIK doesn't reflect the HTML/XML attributes it parsed in its own object attributes.
P.S. note that the Tag
object in BeautifulSoup
does implement __contains__
- so maybe you're trying with the wrong object? Can you show a complete but minimal example that demonstrates the problem?
Running this:
from BeautifulSoup import BeautifulSoup
str = '<el at="some">ABC</el><el>DEF</el>'
z = BeautifulSoup(str)
for x in z.findAll('el'):
print type(x)
print x['at']
I get:
<class 'BeautifulSoup.Tag'>
some
<class 'BeautifulSoup.Tag'>
Traceback (most recent call last):
File "soup4.py", line 8, in <module>
print x['at']
File "C:\Python26\lib\site-packages\BeautifulSoup.py", line 601, in __getitem__
return self._getAttrMap()[key]
KeyError: 'at'
Which is what I expected. The first el
has a at
attribute, the second doesn't - and this throws a KeyError
.
Update 2: the BeautifulSoup.Tag.__contains__
looks inside the contents of the tag, not its attributes. To check if an attribute exists use in
.
Upvotes: 7
Reputation: 63739
To just scan for an element by tag name, a pyparsing solution might be more readable (and without using deprecated API's like has_key
):
from pyparsing import makeXMLTags
# makeXMLTags creates a pyparsing expression that matches tags with
# variations in whitespace, attributes, etc.
el,elEnd = makeXMLTags('el')
# scan the input text and work with elTags
for elTag, tagstart, tagend in el.scanString(xmltext):
if elTag.at:
print elTag.at
For an added refinement, pyparsing allows you to define a filtering parse action so that tags will only match if a particular attribute-value (or attribute-anyvalue) is found:
# import parse action that will filter by attribute
from pyparsing import withAttribute
# only match el tags having the 'at' attribute, with any value
el.setParseAction(withAttribute(at=withAttribute.ANY_VALUE))
# now loop again, but no need to test for presence of 'at'
# attribute - there will be no match if 'at' is not present
for elTag, tagstart, tagend in el.scanString(xmltext):
print elTag.at
Upvotes: 1
Reputation: 57804
If your code is as simple as you provided, you can solve it in a compact way with:
for x in z.findAll('el'):
print x.get('at', 'nothing')
Upvotes: 1
Reputation:
I usually use the get() method for accessing attribute
link = soup.find('a')
href = link.get('href')
name = link.get('name')
if name:
print 'anchor'
if href:
print 'link'
Upvotes: 0