Dillon Bowen
Dillon Bowen

Reputation: 366

How do I set a BeautifulSoup attribute while preserving HTML entities?

Setup:

from bs4 import BeautifulSoup

soup = BeautifulSoup('<img/>', 'html.parser')

I need to do the following:

soup.img['src'] = 'url?x=1&y=2'

Desired behavior:

print(soup)
<img src="url?x=1&y=2"/>

Actual behavior:

print(soup)
<img src="url?x=1&amp;y=2"/>

In general, how do I set a Tag attribute to the literal string I'm passing in?

Upvotes: 0

Views: 125

Answers (2)

Jon Clements
Jon Clements

Reputation: 142226

Your src attribute is indeed stored as 'url?x=1&y=2' however, when you do print(soup) BeautifulSoup applies formatting/escaping to avoid creating possibly incorrect HTML. If you want it to not do that, then you can explicitly declare it, eg:

print(soup.decode(formatter=None))

Reference: Output formatters

Upvotes: 1

Jack Fleeting
Jack Fleeting

Reputation: 24940

The &amp; is just an &; try doing something like this:

soup.img['src'].replace('&amp;','&')

Output:

'url?x=1&y=2'

Upvotes: 0

Related Questions