Larry Cai
Larry Cai

Reputation: 60093

Beautifulsoup return list for attribute "class" while value for other attribute

Beautifulsoup is handy for html parsing in python, and below code result cofuse me.

from bs4 import BeautifulSoup
tr ="""
<table>
    <tr class="passed" id="row1"><td>t1</td></tr>
    <tr class="failed" id="row2"><td>t2</td></tr>
</table>
"""
table = BeautifulSoup(tr,"html.parser")
for row in table.findAll("tr"):
    print row["class"]
    print row["id"]

result:

[u'passed']
row1
[u'failed']
row2 

Why the attribute class returns as array ? while id is normal value ?

beautifulsoup4-4.5.0 is used with python 2.7

Upvotes: 1

Views: 1507

Answers (2)

DeepSpace
DeepSpace

Reputation: 81654

Because elements may have multiple classes.

Consider this example:

from bs4 import BeautifulSoup

tr ="""
<table>
    <tr class="passed a b c" id="row1"><td>t1</td></tr>
    <tr class="failed" id="row2"><td>t2</td></tr>
</table>
"""
table = BeautifulSoup(tr,"html.parser")
for row in table.findAll("tr"):
    print row["class"]
    print row["id"]

['passed', 'a', 'b', 'c']
row1
['failed']
row2

Upvotes: 2

alecxe
alecxe

Reputation: 474051

class is a special multi-valued attribute in BeautifulSoup:

HTML 4 defines a few attributes that can have multiple values. HTML 5 removes a couple of them, but defines a few more. The most common multi-valued attribute is class (that is, a tag can have more than one CSS class)

Sometimes, this is problematic to deal with - for instance, when you want to apply a regular expression to class attribute value as a whole:

You can turn this behavior off by tweaking the tree builder, but I would not recommend doing it.

Upvotes: 1

Related Questions