Borut Flis
Borut Flis

Reputation: 16375

Opening files in a directory with python, trouble with encoding

import os
listing = os.listdir(path)
for infile in listing:
    print infile
    f = open(os.path.join(path, infile), 'r')

I have made a script in python that iterates through all files in a directory and opens them. It works ok, the problem arises with the names of some files. The name of the file is Trade_Map_-_List_of_products_exported_by_Côte_d'Ivoire, but when its tries to open it cant I get this error

IOError: [Errno 2] No such file or directory: "C:\\Users\\Borut\\Downloads\\GC downloads\\izvoz\\Trade_Map_-_List_of_products_exported_by_Co^te_d'Ivoire.txt"

The real name has Côte_d'Ivoire in the end, while the name I get when I iterate through listdir has Co^te_d'Ivoire in the end. What is wrong??

Upvotes: 4

Views: 3300

Answers (1)

Marc-Olivier Titeux
Marc-Olivier Titeux

Reputation: 1317

The encoding of os.listdir(path) depends on the encoding of the string path. If path is unicode, then the list of entries returned by os.listdir(path) will be unicode. Otherwise, the returned list will use the system default encoding. If you want to be sure to output your list of file correctly, you could try the following (untested):

import os
import sys

path = unicode(path, sys.getfilesystemencoding())

# All elements of listing will be in unicode.
listing = os.listdir(path)
for infile in listing:
    print infile

    # When infile is in unicode, the system to open 
    # the file using the correct encoding for the filename
    f = open(os.path.join(path, infile), 'r')

sys.getfilesystemencoding() is a method to get your system default encoding, which is how open and other methods expect their string inputs to be in (even though unicode is also fine, as they convert them automatically to the default encoding).

Reference: http://docs.python.org/howto/unicode.html#unicode-filenames

Upvotes: 2

Related Questions