Reputation: 842
I need to edit hundreds of .html files with beautifulSoup 4.
My CSS formatting is lost when I write back the changes to file.
Before prettify()
:
And prettify()
:
My code:
from bs4 import BeautifulSoup
import os
files = []
path = r"C:\Files"
for file in os.listdir(path):
if file.endswith('.html'):
files.append(file)
for htmlfile in files:
soup = BeautifulSoup(open(htmlfile, encoding="utf-8"), "html.parser")
soup.header.decompose()
soup.menu.decompose()
pretty_html = soup.prettify('utf-8', 'minimal')
with open(htmlfile, "wb") as outfile:
outfile.write(pretty_html)
If I don't prettify()
and write is out as below:
with open(file, "w") as outfile:
outfile.write(str(soup))
I get an encoding error:
outfile.write(str(soup))
File "...env\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2192' in position 2027: character maps to <undefined>
Seems to be "utf-8" to "cp1252" enconding issue.
I can't wrap my head around this encoding stuff.
Upvotes: 2
Views: 862