user2015487
user2015487

Reputation: 2185

How to remove xml encoding from beautiful soup?

I would like to know how i can remove the encoding automatically created by prettify in BeautifulSoup. Example:

tree='''<A attribute1="1" attribute2="2">
 <B>
  <C/>
 </B>
</A>'''
from collections import defaultdict
from bs4 import BeautifulSoup as Soup
root = Soup(tree, 'lxml-xml')
print root.prettify().replace('\n', '')

The output looks like

<?xml version="1.0" encoding="utf-8"?><A attribute1="1" attribute2="2"> <B>  <C/> </B></A>

I would like simply:

<A attribute1="1" attribute2="2"> <B>  <C/> </B></A>

Upvotes: 1

Views: 1042

Answers (1)

Dean Fenster
Dean Fenster

Reputation: 2395

There are a few ways you can go about it:

The first, call root.decode_contents(), which will give you a non-prettified content-only output.

Or prettify each chunk in contents separately and then join them. Like this: '\n'.join(x.prettify() for x in root.contents).

Upvotes: 2

Related Questions