Reputation: 3
The execution of a simple script is not going as thought.
notAllowed = {"â":"a", "à":"a", "é":"e", "è":"e", "ê":"e",
"î":"i", "ô":"o", "ç":"c", "û":"u"}
word = "dôzerté"
print word
for char in word:
if char in notAllowed.keys():
print "hooray"
word = word.replace(char, notAllowed[char])
print word
print "finished"
The output return the word unchanged, even though it should have changed "ô" and "é" to o and e, thus returning dozerte...
Any ideas?
Upvotes: 0
Views: 1158
Reputation: 12478
Iterating a string iterates its bytes, not necessarily its characters. If the encoding of your python source file is utf-8, len(word)
will be 9 insted of 7 (both special characters have a two-byte encoding). Iterating a unicode string (u"dôzerté"
) iterates characters, so that should work.
May I also suggest you use unidecode for the task you're trying to achieve?
Upvotes: 2
Reputation: 9948
How about:
# -*- coding: utf-8 -*-
notAllowed = {u"â":u"a", u"à":u"a", u"é":u"e", u"è":u"e", u"ê":u"e",
u"î":u"i", u"ô":u"o", u"ç":u"c", u"û":u"u"}
word = u"dôzerté"
print word
for char in word:
if char in notAllowed.keys():
print "hooray"
word = word.replace(char, notAllowed[char])
print word
print "finished"
Basically, if you want to assign an unicode string to some variable you need to use:
u"..."
#instead of just
"..."
to denote the fact that this is the unicode string.
Upvotes: 2