Reputation: 1015
My problem:
- Start with US Windows 10 install
- Create a Japanese filename in Windows explorer
- Open the Python shell, and
os.listdir('.')
- The listed filename is full of question marks.
os.path.exists()
unsurprisingly reports file not found.
NTFS stores the filename as Unicode. I'm sure if I used the win32api CreateFile()
series of functions I will get my Unicode filename back, however those APIs are too cumbersome (and not portable). I'd prefer that I get utf-8 encoded filenames, or the Unicode bytes from the FS directory structure, but in default mode this doesn't seem to happen.
I have tried playing around with setlocale()
but I haven't stumbled upon the correct arguments to make my program work. I do not want to (and cannot) install additional code pages onto the Windows machine. This needs to work with a stock install of Windows.
Please note this has nothing to do with the console. A repr() shows that the ? chars that end up in the filename listed by os.listdir('.')
are real question marks and not some display artifact. I assume they have been added by the API that listdir()
uses under the hood.
Upvotes: 2
Views: 1333
Reputation: 3046
You may be getting ?
s while displaying that filename in the console using os.listdir()
but you can access that filename without any problems as internally everything is stored in binary. If you are trying to copy the filename and paste it directly in python, it will be interpreted as mere question marks...
If you want to open that file and perform any operations, then, have a look at this...
files = os.listdir(".")
# Possible output:
# ["a.txt", "file.py", ..., "??.html"]
filename = files[-1] # The last file in this case
f = open(filename, 'r')
# Sample file operation
lines = f.readlines()
print(lines)
f.close()
EDIT:
In Python 2, you need to pass current path as Unicode which could be done using: os.listdir(u'.')
, where the .
means current path. This will return the list of filenames in Unicode...
Upvotes: 2