andreas-h
andreas-h

Reputation: 11100

How to replace one umlaut and the following character with two other characters

I need to replace German Umlauts (Ä, ä, Ö, ö, Ü, ü, ß) with their two-letter equivalents (Ae, ae, Oe, oe, Ue, ue, ss).

Currently, I have this function, but the string's length changes:

def _translate_umlauts(s):
    """Translate a string into ASCII.

    This Umlaut translation comes from http://stackoverflow.com/a/2400577/152439
    """
    trans = {"\xe4" : "ae"}   # and more ...
    patt = re.compile("|".join(trans.keys()))
    return patt.sub(lambda x: trans[x.group()], s)

However, I have the requirement that the string's total length should not change. For example, Mär should become Mae.

Any help in deriving the appropriate solution (regex?) is greatly appreciated :)

Upvotes: 1

Views: 1589

Answers (2)

Mark Tolonen
Mark Tolonen

Reputation: 177971

Just truncate back to the original string length:

return patt.sub(lambda x: trans[x.group()], s)[:len(s)]

Upvotes: 1

Tomalak
Tomalak

Reputation: 338326

... the string's total length should not change.

Well, that's an odd requirement, but

patt = re.compile("([" + "".join(trans.keys()) + "]).")

Note that this will not replace the umlaut if it is the last character in the string. For obvious reasons this would change the string length.

Upvotes: 1

Related Questions