Lucas
Lucas

Reputation: 3502

How to rename a folder correct?

I want to edit folder names programmatically. On windows it works flawless - on linux it is kinda broken A?A¶A¼ (Äöü).
Locales on unix works fine - using en_US.UTF-8 When I create a directory (called Äöü) on the same path as my script uses it shows correctly as Äöü. When Python generates the directory its visible as A?A¶A¼.

Input

Äöü

Actual Output

A?A¶A¼

Expected Output

Aeoeue

I have a set of characters that have to be replaced.

ä : ae        
Ä : Ae
ö : oe
Ö : Oe
ü : ue
Ü : Ue
á : a
à : a
Á : A
....

This is how I read the file:

def __getChars(file):
    chars = {}
    with open(file) as f:
        content = f.readlines()
    for c in content:
        c = c.split(':')
        x = c[0].strip()

        try:
            y = c[1].strip()
        except:
            y = ''
        chars[x] = y
    return chars

 

This is how I replace the names

def __replace(string):
    try:
        string = string.encode('utf8')
    except:
        pass
    for char in chars.keys():
        key = char
        value = chars[key]
        string = string.replace(key, value)
    return string

This is how I call __replace()

chars = __getChars(os.path.join(os.getcwd(), 'system', 'replace'))

for path in os.listdir(root):
    src = os.path.join(root, path)
    if os.path.isdir(src):
        dst = os.path.join(root, __replace(repr(path).decode('unicode-escape')))
        if src != dst:
            os.rename(src, dst)

Upvotes: 2

Views: 160

Answers (1)

hynekcer
hynekcer

Reputation: 15548

The problem was in this hack:

repr(path).decode('unicode-escape')

You can not be sure about bytecode encoding on different systems, even on Windows with different system encodings or differently compiled Python, e.g. CygWin or PyWin. The only sure method is to get the list of filenames in unicode by calling listdir with unicode path, e.g. os.listdir(u'.') or os.listdir(root.decode('utf-8')):. It is much better to do all filesystem operations in unicode.

I even wrote a similar simple program that works in both Python 2 and Python 3 without any hack and it can rename all files and directories in a tree to ASCII. Most of your substitutions, that only removes accents from letters can be done by

from unicodedata import normalize
out_str = normalize('NFKD', in_str).encode('ascii', 'ignore').decode('ascii')

Upvotes: 1

Related Questions