tmighty
tmighty

Reputation: 11399

What is this crazy German character combination to represent an umlaut?

I was just parsing the following website.

There one finds the text

und wären damit auch

At first, the "ä" looks perfectly fine, but once I inspect it, it turns out that this is not the regular "ä" (represented as ascw 228) but this:

ascw: 97, char: a
ascw: 776, char: ¨

I have never before seen an "ä" represented like this.

How can it happen that a website uses this weird character combination and what might be the benefit from it?

Upvotes: 4

Views: 1108

Answers (3)

David Mansell
David Mansell

Reputation: 1

The use of the word "diaeresis" to describe the correctly characterised "umlaut" is wrong. A diaeresis is found on the second of two vowels placed together to indicate that they should be pronounced separately. An umlaut is placed over a single letter to indicate that it is pronounced differently (in German "um" = change and "laut" = sound) as though it were combined with an "e".

Upvotes: -1

connectedMind
connectedMind

Reputation: 439

Oh my, this was the answer or original problem with the name of a fileupload.

Cannot convert argument 2 to ByteString because the character at index 6 has value 776 which is greater than 255

For future references.

Upvotes: 0

Codo
Codo

Reputation: 78945

What you don't mention in your questions is the used encoding. Quite obviously it is a Unicode based encoding.

In Unicode, code point U+0308 (776 in decimal) is the combining diaeresis. Out of the letter a and the diaeresis, the German character ä is created.

There are indeed two ways to represent German characters with umlauts (ä in this case). Either as a single code point:

U+00E4 latin small letter A with diaeresis

Or as a sequence of two code points:

U+0061 latin small letter A
U+0308 combining diaeresis

Similarly you would combine two code points for an upper case 'Ä':

U+0041 latin capital letter A
U+0308 combining diaeresis

In most cases, Unicode works with two codes points as it requires fewer code points to enable a wide range characters with diacritics. However for historical reasons a special code point exist for letters with German umlauts and French accents.

The Unicode libraries is most programming languages provide functions to normalize a string, i.e. to either convert all sequences into a single code point if possible or extend all single code points into the two code point sequence. Also see Unicode Normalization Forms.

Upvotes: 6

Related Questions