TacoCat
TacoCat

Reputation: 469

Encoding error trying to write file with python

Here is the full script:

import requests
import bs4


res = requests.get('https://example.com')
soup = bs4.BeautifulSoup(res.text, 'lxml')
page_HTML_code = soup.prettify()

multiline_code = """{}""".format(page_HTML_code)

f = open("testfile.txt","w+")
f.write(multiline_code)
f.close()

So I'm trying to write the entire Downloaded HTML as a file while keeping it neat and clean.

I do understand that it has problems with the text and can't save certain characters, but I'm not sure how to encode the text correctly.

Can anyone help?

This is the error message that I will get

"C:\Location", line 16, in <module>
    f.write(multiline_code)
  File "C:\\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u0421' in position 209: character maps to <undefined>

Upvotes: 1

Views: 615

Answers (1)

TacoCat
TacoCat

Reputation: 469

I did some digging around and this worked:

import requests
import bs4


res = requests.get('https://example.com')

soup = bs4.BeautifulSoup(res.text, 'lxml')

page_HTML_code = soup.prettify()



multiline_code = """{}""".format(page_HTML_code)

#add the Encoding part when opening file and this did the trick
with open('testfile.html', 'w+', encoding='utf-8') as fb:
    fb.write(multiline_code)

Upvotes: 1

Related Questions