Reputation: 3550
Before someone says this is a duplicate question, I just want to let you know that the error I am getting from running this program in command line is different from all the other related questions I've seen.
I am trying to run a very short script in Python
from bs4 import BeautifulSoup
import urllib.request
html = urllib.request.urlopen("http://dictionary.reference.com/browse/word?s=t").read().strip()
dhtml = str(html, "utf-8").strip()
soup = BeautifulSoup(dhtml.strip(), "html.parser")
print(soup.prettify())
But I keep getting an error when I run this program with python.exe. UnicodeEncodeError: 'charmap' codec can't encode character '\u025c
. I have tried a lot of methods to get around this, but I managed to isolate it to the problem of converting bytes to strings. When I run this program in IDLE, I get the HTML as expected. What is it that IDLE is automatically doing? Can I use IDLE's interpretation program instead of python.exe? Thanks!
My problem is caused by print(soup.prettify())
but type(soup.prettify())
returns str
?
I finally made a decision to use encode()
and decode()
because of the trouble that has been caused. If someone knows how to actually resolve a question, please do; also, thank you for all your answers
Upvotes: 0
Views: 1572
Reputation: 3550
I finally made a decision to use encode() and decode() because of the trouble that has been caused. If someone knows how to actually resolve a question, please do; also, thank you for all your answers
Upvotes: 0
Reputation: 414079
UnicodeEncodeError: 'charmap' codec can't encode character '\u025c'
The console character encoding can't represent '\u025c'
i.e., "ɜ" Unicode character (U+025C LATIN SMALL LETTER REVERSED OPEN E).
What is it that IDLE is automatically doing?
IDLE displays Unicode directly (only BMP characters) if the corresponding font supports given Unicode characters.
Can I use IDLE's interpretation program instead of python.exe
Yes, run:
T:\> py -midlelib -r your_script.py
Note: you could write arbitrary Unicode characters to the Windows console if Unicode API is used:
T:\> py -mpip install win-unicode-console
T:\> py -mrun your_script.py
See What's the deal with Python 3.4, Unicode, different languages and Windows?
Upvotes: 3
Reputation: 536349
I just want to let you know that the error I am getting from running this program in command line is different from all the other related questions I've seen.
Not really. You have PrintFails like everyone else.
The Windows console can't print Unicode. (This isn't strictly true, but going into exactly why, when and how you can get Unicode out of the console is a painful exercise and not usually worth it.) Trying to print a character that isn't in the console's limited encoding can't work, so Python gives you an error.
print them out (which I need an easier solution to because I cannot do .encode("utf-8") for a lot of elements
You could run the command set PYTHONIOENCODING=utf-8
before running the script to tell Python to use and encoding which can include any character (so no errors), but any non-ASCII output will still come out garbled as its encoding won't match the console's actual code page.
(Or indeed just use IDLE.)
Upvotes: 1