Reputation: 465
On Python 2.7,
for dir in os.listdir("E:/Library/Documents/Old - Archives/Case"):
print dir
prints out:
Danny.xlsx
Dannyh.xlsx
~$??? ?? ?????? ??? ???? ???????.docx
while this:
# using a unicode literal
for dir in os.listdir(u"E:/Library/Documents/Old - Archives/Case"):
print dir
prints out:
Dan.xlsx
Dann.xlsx
Traceback (most recent call last):
File "E:\...\FirstModule.py", line 31, in <module>
print dir
File "C:\Python27\lib\encodings\cp1252.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 2-4: character maps to <undefined>
The file's name is in Hebrew, as such: המסמך.xls
How can I make it appear in Hebrew in Python too?
Upvotes: 4
Views: 5103
Reputation: 465
Solved it: # -*- coding: utf-8 -*-
at the top of the document solved it.
Upvotes: 2
Reputation: 177725
The problem is your output console uses a cp1252
encoding per your error message, and Hebrew cannot be printed under that encoding. Use an IDE that supports UTF-8, and a font in that IDE that suports Hebrew and it will work correctly when using os.listdir
with a Unicode path.
Here's an example from the PythonWin IDE with and without a Unicode path.
PythonWin 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32.
Portions Copyright 1994-2008 Mark Hammond - see 'Help/About PythonWin' for further copyright information.
>>> import os
>>> for f in os.listdir('.'):
... print f
...
x.exe
x.py
x.pyc
y.py
?????.xls
>>> for f in os.listdir(u'.'):
... print f
...
x.exe
x.py
x.pyc
y.py
המסמך.xls
Also note that an encoding declaration in your source file does nothing for generating output. It only declares what encoding the source file is saved in, which affects the ability to write non-ASCII characters in the source file.
Upvotes: 1
Reputation: 536409
The version with u''
string literal works fine: ask with a Unicode pathname and you'll get a Unicode pathname in response, allowing you to work with filenames that include characters outside the current code page.
Your problem comes solely from trying to print
the filename. Getting Unicode output to the Windows Command Prompt is a trial.
The default C standard library print function is limited to the locale code page. Unless you call the Win32 API function WriteConsoleW
directly (using ctypes) you're never going to get reliable console Unicode support; and even then it won't work unless a suitable non-default font is chosen. This affects pretty much all non-native command line tools, not just Python.
Upvotes: 6