joke4me
joke4me

Reputation: 842

Python - CSS formatting lost when Beautifulsoup4 prettify() on a local .html file

I need to edit hundreds of .html files with beautifulSoup 4.

My CSS formatting is lost when I write back the changes to file.

Before prettify(): enter image description here

And prettify(): enter image description here

My code:

from bs4 import BeautifulSoup
import os

files = []
path = r"C:\Files"

for file in os.listdir(path):
    if file.endswith('.html'):
        files.append(file)

for htmlfile in files:
    soup = BeautifulSoup(open(htmlfile, encoding="utf-8"), "html.parser")

    soup.header.decompose()
    soup.menu.decompose()

    pretty_html = soup.prettify('utf-8', 'minimal')
    with open(htmlfile, "wb") as outfile:
        outfile.write(pretty_html)

If I don't prettify() and write is out as below:

with open(file, "w") as outfile:
    outfile.write(str(soup))

I get an encoding error:

outfile.write(str(soup))
File "...env\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2192' in position 2027: character maps to <undefined>

Seems to be "utf-8" to "cp1252" enconding issue.

I can't wrap my head around this encoding stuff.

Upvotes: 2

Views: 862

Answers (0)

Related Questions