Reputation: 291
I'm using os.walk to create a list of all music files under a folder. Some of these filenames are non-ascii, for example:
01 空即是色.mp3
I'm using the mutagen library to parse metadata for this file, and it professes complete unicode support. The filename is being retrieved as unicode, and can be printed as unicode. However, no matter what I do (including normalising the unicode beforehand, or encoding it as utf-8 beforehand), mutagen attempts to open()
01 \xe7\xa9\xba\xe5\x8d\xb3\xe6\x98\xaf\xe8\x89\xb2.mp3
or
01 \u7a7a\u5373\u662f\u8272.mp3
How can I force it to open()
the correct filename (the one it is perfectly capable of print
ing)?
The full code is here.
Note: I am rather new to python and programming in general, any advice you could give in regards to my code would be very much appreciated. Thanks in advance
EDIT: Okay, this is a rather embarrassing error of mine, the problem was not the character encoding, it was the fact that the path was not being appended to the open()
call. How do I find the full path for a file found via walk()
? The files are 2-3 directories deep.
Upvotes: 1
Views: 787
Reputation: 328624
Note that walk(dir)
returns the filename without path. If you want to open the file, you must prepend dir
:
for dirpath, dirnames, filenames in os.walk(dir):
for filename in filenames:
path = os.path.join(dirpath, filename)
Upvotes: 2