Reputation: 60093
Beautifulsoup is handy for html parsing in python, and below code result cofuse me.
from bs4 import BeautifulSoup
tr ="""
<table>
<tr class="passed" id="row1"><td>t1</td></tr>
<tr class="failed" id="row2"><td>t2</td></tr>
</table>
"""
table = BeautifulSoup(tr,"html.parser")
for row in table.findAll("tr"):
print row["class"]
print row["id"]
result:
[u'passed']
row1
[u'failed']
row2
Why the attribute class
returns as array ? while id
is normal value ?
beautifulsoup4-4.5.0
is used with python 2.7
Upvotes: 1
Views: 1507
Reputation: 81654
Because elements may have multiple classes.
Consider this example:
from bs4 import BeautifulSoup
tr ="""
<table>
<tr class="passed a b c" id="row1"><td>t1</td></tr>
<tr class="failed" id="row2"><td>t2</td></tr>
</table>
"""
table = BeautifulSoup(tr,"html.parser")
for row in table.findAll("tr"):
print row["class"]
print row["id"]
['passed', 'a', 'b', 'c']
row1
['failed']
row2
Upvotes: 2
Reputation: 474051
class
is a special multi-valued attribute in BeautifulSoup
:
HTML 4 defines a few attributes that can have multiple values. HTML 5 removes a couple of them, but defines a few more. The most common multi-valued attribute is
class
(that is, a tag can have more than one CSS class)
Sometimes, this is problematic to deal with - for instance, when you want to apply a regular expression to class
attribute value as a whole:
You can turn this behavior off by tweaking the tree builder, but I would not recommend doing it.
Upvotes: 1