Reputation: 61
I've read quite a bit on the topic already, including what seems to be the definitive guide on this topic here: http://docs.python.org/howto/unicode.html
Perhaps for a more experienced developer, that guide may be enough. However, in my case, I'm more confused than when I started and still haven't resolved my issue.
I am trying to read filenames using os.walk() and to obtain certain information about the files (such as filesize) before writing that information to a text file. This works as long as I don't run into any files with filenames encoded in utf. When it hits a file with a utf encoded name I get an error like this one:
WindowsError: [Error 123] The filename, directory name, or volume label syntax is incorrect: 'Documents\\??.txt'
In that case, the file was named 唽咿.txt.
Here is how I have been trying to do it so far:
for (root, dirs, files) in os.walk(dirpath):
for filename in files:
filepath = os.path.join(root, filename)
filesize = os.stat(filepath).st_size
file = open(filepath, 'rb')
stuff = get_stuff(filesize, file)
file.close()
In case it matters, dirpath came from an earlier portion of code that amounts to 'dirpath = raw_input()'.
I've tried various things such as changing the filepath line to:
filepath = unicode(os.path.join(unicode(root), unicode(filename)))
But nothing I have tried has worked.
Here are my two questions:
How can I get it to pass the correct filename to the os.stat() method so that I can get a correct response from it?
My script needs to write some filenames into a text file that it may later want to read from. At that point it needs to be able to find the file based on what it just read from the text file. How do I write such filenames to a text file properly and then read from it properly later?
Upvotes: 4
Views: 1310
Reputation: 61
For those interested in the full solution:
dirpath = raw_input()
was changed to:
dirpath = raw_input().decode(sys.stdin.encoding)
That allowed for the argument being passed to os.walk() to be in unicode, causing the filenames it returned to also be in unicode.
To write these to or from a file (my second question) I used the codecs.open() functionality
Upvotes: 2
Reputation: 799580
Pass a unicode
path to os.walk()
.
Changed in version 2.3: On Windows NT/2k/XP and Unix, if path is a Unicode object, the result will be a list of Unicode objects.
Upvotes: 2