Reputation: 3502
I want to edit folder names programmatically. On windows it works flawless - on linux it is kinda broken A?A¶A¼
(Äöü).
Locales on unix works fine - using en_US.UTF-8
When I create a directory (called Äöü
) on the same path as my script uses it shows correctly as Äöü
. When Python generates the directory its visible as A?A¶A¼
.
Äöü
A?A¶A¼
Aeoeue
I have a set of characters that have to be replaced.
ä : ae
Ä : Ae
ö : oe
Ö : Oe
ü : ue
Ü : Ue
á : a
à : a
Á : A
....
This is how I read the file:
def __getChars(file):
chars = {}
with open(file) as f:
content = f.readlines()
for c in content:
c = c.split(':')
x = c[0].strip()
try:
y = c[1].strip()
except:
y = ''
chars[x] = y
return chars
This is how I replace the names
def __replace(string):
try:
string = string.encode('utf8')
except:
pass
for char in chars.keys():
key = char
value = chars[key]
string = string.replace(key, value)
return string
This is how I call __replace()
chars = __getChars(os.path.join(os.getcwd(), 'system', 'replace'))
for path in os.listdir(root):
src = os.path.join(root, path)
if os.path.isdir(src):
dst = os.path.join(root, __replace(repr(path).decode('unicode-escape')))
if src != dst:
os.rename(src, dst)
Upvotes: 2
Views: 160
Reputation: 15548
The problem was in this hack:
repr(path).decode('unicode-escape')
You can not be sure about bytecode encoding on different systems, even on Windows with different system encodings or differently compiled Python, e.g. CygWin or PyWin. The only sure method is to get the list of filenames in unicode by calling listdir
with unicode path, e.g. os.listdir(u'.')
or os.listdir(root.decode('utf-8')):
. It is much better to do all filesystem operations in unicode.
I even wrote a similar simple program that works in both Python 2 and Python 3 without any hack and it can rename all files and directories in a tree to ASCII. Most of your substitutions, that only removes accents from letters can be done by
from unicodedata import normalize
out_str = normalize('NFKD', in_str).encode('ascii', 'ignore').decode('ascii')
Upvotes: 1