user3664862
user3664862

Reputation: 298

BeautifulSoup4 class with whitespaces not treated as single string

>>> soup = BeautifulSoup('<div class="class1 class2 class3">...</div>','lxml')
>>> soup.find('div')['class']
['class1', 'class2', 'class3']

How can i force BS4 to treat class name as a single string?

Upvotes: 1

Views: 56

Answers (1)

Padraic Cunningham
Padraic Cunningham

Reputation: 180391

You could use xml as the parser:

soup = BeautifulSoup('<div class="class1 class2 class3">...</div>',"xml")
print(soup.find('div')['class'])
class1 class2 class3

Or you could remove 'class' from builder.cdata_list_attributes['*']:

del BeautifulSoup().builder.cdata_list_attributes["*"][0]

soup = BeautifulSoup('<div class="class1 class2 class3">...</div>')
print(soup.find('div')['class'])
class1 class2 class3

Upvotes: 1

Related Questions