Reputation: 11100
I need to replace German Umlauts (Ä, ä, Ö, ö, Ü, ü, ß) with their two-letter equivalents (Ae, ae, Oe, oe, Ue, ue, ss).
Currently, I have this function, but the string's length changes:
def _translate_umlauts(s):
"""Translate a string into ASCII.
This Umlaut translation comes from http://stackoverflow.com/a/2400577/152439
"""
trans = {"\xe4" : "ae"} # and more ...
patt = re.compile("|".join(trans.keys()))
return patt.sub(lambda x: trans[x.group()], s)
However, I have the requirement that the string's total length should not change. For example, Mär should become Mae.
Any help in deriving the appropriate solution (regex?) is greatly appreciated :)
Upvotes: 1
Views: 1589
Reputation: 177971
Just truncate back to the original string length:
return patt.sub(lambda x: trans[x.group()], s)[:len(s)]
Upvotes: 1
Reputation: 338326
... the string's total length should not change.
Well, that's an odd requirement, but
patt = re.compile("([" + "".join(trans.keys()) + "]).")
Note that this will not replace the umlaut if it is the last character in the string. For obvious reasons this would change the string length.
Upvotes: 1