michaelmeyer
michaelmeyer

Reputation: 8205

Get the content of tag's attribute as a unicode string in BeautifulSoup 4

According to BeautifulSoup documentation, it is possible to get the value of tag's attribute by using a code which looks like this :

from bs4 import BeautifulSoup

soup = BeautifulSoup('<b class="boldest">Extremely bold</b>')
tag = soup.b

tag['class']

Theoretically (that is, according to the doc), the output would be :

u'boldest'

However, when I execute the above code, it outputs :

['boldest']

So, is there something I'm missing ? How can I obtain a tag's attribute content as a plain unicode string ?

Upvotes: 0

Views: 254

Answers (2)

kfunk
kfunk

Reputation: 2082

Check this section in the documentation:

Multi-valued attributes

HTML 4 defines a few attributes that can have multiple values. HTML 5 removes a couple of them, but defines a few more. The most common multi-valued attribute is class (that is, a tag can have more than one CSS class). Others include rel, rev, accept-charset, headers, and accesskey. Beautiful Soup presents the value(s) of a multi-valued attribute as a list:

tag['class'][0] will give you the string

Upvotes: 1

der_fenix
der_fenix

Reputation: 241

tag['class'][0]

There are can be more than one class in tag, thats why it return list of values. If you sure there is only one class there - just get first element from list.

Upvotes: 1

Related Questions