Reputation:
I scraped a webpage (name changed in code here) as follows:
import requests
r = requests.get('https://www.samplewebpage.com')
Then I tried to write r.text to a file as follows:
f = open ('filename', 'w')
f.write(r.text)
f.close()
I get an error as:
UnicodeEncodeError: 'charmap' codec can't encode character '\u20b9' in position 158691: character maps to <undefined>
r.encoding shows UTF-8. How to resolve the above?
Have also tried the following: - few other random webpages and am able to run the code without any error for most. - instead of r.text used r.content.decode('utf-8', 'ignore') but same error as above
My environment/system specifications:
Suspecting console encoding mismatch as I read in another similar problem on this forum, I reconfirmed from that the Atom console is set to UTF-8, though I believe console encoding is not the problem here, as I want to write to a file.
Thanks
Upvotes: 2
Views: 2249
Reputation: 20698
Try explicitly specifying the file's encoding:
f = open ('filename', 'w', encoding='utf8')
f.write(r.text)
f.close()
Upvotes: 3