Reputation:
I don't seem to be able to open a file which has a unicode filename. Lets say I do:
for i in os.listdir():
open(i, 'r')
When I try to search for some solution, I always get pages about how to read and write a unicode string to a file, not how to open a file with file()
or open()
which has a unicode name.
Upvotes: 12
Views: 39389
Reputation: 88977
Simply pass open()
a unicode string for the file name:
In Python 2.x:
>>> open(u'someUnicodeFilenameλ')
<open file u'someUnicodeFilename\u03bb', mode 'r' at 0x7f1b97e70780>
In Python 3.x, all strings are Unicode, so there is literally nothing to it.
As always, note that the best way to open a file is always using the with
statement in conjunction with open()
.
Edit: With regards to os.listdir()
the advice again varies, under Python 2.x, you have to be careful:
os.listdir(), which returns filenames, raises an issue: should it return the Unicode version of filenames, or should it return 8-bit strings containing the encoded versions? os.listdir() will do both, depending on whether you provided the directory path as an 8-bit string or a Unicode string. If you pass a Unicode string as the path, filenames will be decoded using the filesystem’s encoding and a list of Unicode strings will be returned, while passing an 8-bit path will return the 8-bit versions of the filenames.
So in short, if you want Unicode out, put Unicode in:
>>> os.listdir(".")
['someUnicodeFilename\xce\xbb', 'old', 'Dropbox', 'gdrb']
>>> os.listdir(u".")
[u'someUnicodeFilename\u03bb', u'old', u'Dropbox', u'gdrb']
Note that the file will still open either way - it won't be represented well within Python as it'll be an 8-bit string, but it'll still work.
open('someUnicodeFilename\xce\xbb')
<open file 'someUnicodeFilenameλ', mode 'r' at 0x7f1b97e70660>
Under 3.x, as always, it's always Unicode.
Upvotes: 27
Reputation: 4448
You can try this:
import os
import sys
for filename in os.listdir(u"/your-direcory-path/"):
open(filename.encode(sys.getfilesystemencoding()), "r")
Upvotes: 7