rassa45
rassa45

Reputation: 3550

Python program is running in IDLE but not in command line

Before someone says this is a duplicate question, I just want to let you know that the error I am getting from running this program in command line is different from all the other related questions I've seen.

I am trying to run a very short script in Python

from bs4 import BeautifulSoup
import urllib.request




html = urllib.request.urlopen("http://dictionary.reference.com/browse/word?s=t").read().strip()
dhtml = str(html, "utf-8").strip()
soup = BeautifulSoup(dhtml.strip(), "html.parser")
print(soup.prettify())

But I keep getting an error when I run this program with python.exe. UnicodeEncodeError: 'charmap' codec can't encode character '\u025c. I have tried a lot of methods to get around this, but I managed to isolate it to the problem of converting bytes to strings. When I run this program in IDLE, I get the HTML as expected. What is it that IDLE is automatically doing? Can I use IDLE's interpretation program instead of python.exe? Thanks!

EDIT:

My problem is caused by print(soup.prettify()) but type(soup.prettify()) returns str?

RESOLVED:

I finally made a decision to use encode() and decode() because of the trouble that has been caused. If someone knows how to actually resolve a question, please do; also, thank you for all your answers

Upvotes: 0

Views: 1572

Answers (3)

rassa45
rassa45

Reputation: 3550

I finally made a decision to use encode() and decode() because of the trouble that has been caused. If someone knows how to actually resolve a question, please do; also, thank you for all your answers

Upvotes: 0

jfs
jfs

Reputation: 414079

UnicodeEncodeError: 'charmap' codec can't encode character '\u025c'

The console character encoding can't represent '\u025c' i.e., "ɜ" Unicode character (U+025C LATIN SMALL LETTER REVERSED OPEN E).

What is it that IDLE is automatically doing?

IDLE displays Unicode directly (only BMP characters) if the corresponding font supports given Unicode characters.

Can I use IDLE's interpretation program instead of python.exe

Yes, run:

T:\> py -midlelib -r your_script.py

Note: you could write arbitrary Unicode characters to the Windows console if Unicode API is used:

T:\> py -mpip install win-unicode-console
T:\> py -mrun your_script.py

See What's the deal with Python 3.4, Unicode, different languages and Windows?

Upvotes: 3

bobince
bobince

Reputation: 536349

I just want to let you know that the error I am getting from running this program in command line is different from all the other related questions I've seen.

Not really. You have PrintFails like everyone else.

The Windows console can't print Unicode. (This isn't strictly true, but going into exactly why, when and how you can get Unicode out of the console is a painful exercise and not usually worth it.) Trying to print a character that isn't in the console's limited encoding can't work, so Python gives you an error.

print them out (which I need an easier solution to because I cannot do .encode("utf-8") for a lot of elements

You could run the command set PYTHONIOENCODING=utf-8 before running the script to tell Python to use and encoding which can include any character (so no errors), but any non-ASCII output will still come out garbled as its encoding won't match the console's actual code page.

(Or indeed just use IDLE.)

Upvotes: 1

Related Questions