Tim Schmelter
Tim Schmelter

Reputation: 460360

Why is "ss" equal to the German sharp-s character 'ß'?

Coming from this question I'm wondering why ä and ae are different(which makes sense) but ß and ss are treated as equal. I haven't found an answer on SO even if this question seems to be related and even mentions "that ß will compare equal to SS in Germany, or similar" but not why.

The only resource on MSDN I found was this: How to: Compare Strings

Here is mentioned following but also lacks the why:

// "They dance in the street." 
// Linguistically (in Windows), "ss" is equal to 
// the German essetz: 'ß' character in both en-US and de-DE cultures. 
.....

So why does this evaluate to true, both with de-DE culture or any other culture:

var ci = new CultureInfo("de-DE");
int result = ci.CompareInfo.Compare("strasse", "straße", CompareOptions.IgnoreNonSpace); // 0
bool equals = String.Equals("strasse", "straße", StringComparison.CurrentCulture); // true
equals = String.Equals("strasse", "straße", StringComparison.InvariantCulture);  // true

Upvotes: 35

Views: 10745

Answers (8)

Alex Zhukovskiy
Alex Zhukovskiy

Reputation: 10055

Starting with .Net 5.0 these comparisons now returns -1/NotEqual. See https://learn.microsoft.com/en-us/dotnet/core/compatibility/globalization/5.0/icu-globalization-api for details

Upvotes: 1

Sebastian Wagner
Sebastian Wagner

Reputation: 2526

Just wait half a century.

This year, after over a century of dispute, German added officially the as a valid uppercase replacement for the lowercase version ß. It will take some time before people get used to the new uppercase form , but as soon as the capital version will dominate, there will be no reason to continue this evil

String.Equals("Mr. Meißner", "Mr. Meissner", StringComparison.CurrentCulture) == true;

hack.

Upvotes: 3

rhavin
rhavin

Reputation: 1697

Most of what i read here is true. But there are some misconceptions involved, so – as a German – let me put this straight:

ß/ẞ is a genuin german letter comming from a ligature of either ſs or ſz but never ss. That is long-s followed by either s or z.

A mid-syllable s in german is pronounced /z/ while a start and end-syllable s is pronounced /s/. As the letter z in german is always pronounced /ts/, it needed a way to distiguish those rarer cases, where that rule is broken by adding another letter and finally forming that ligature for those cases, where a mid-syllable sound /s/ was needed.

The sound /s/ never occures in genuin german words in the beginning and just in one foreign word, where it is (tada!) written with sz: Szene. So the need for a capital ß (ẞ) first arrised as capitalization of whole words came into use. ß and ss are not the same, historically ſz and ß are, that's why it is called an "eszett"! There are certain rules that allow ß to ss translation if ß is not available which is not true in modern evironments.

The right capitalization of Maße is MAẞE, and the right capitalization of Masse is MASSE. Both are different words in german.

So, in actual german, ss is /s/ shorting the vowel before and ß is /s/ after a long vowel. Assuming ss and ß being equal in any comparation is simply wrong because it might force words of completely different meaning being equal. Period.

Upvotes: 2

TaW
TaW

Reputation: 54463

A few background facts:

  • In Swiss German the eszet has been eliminated and replaced by ss in the 70s I think

  • For uppercase conversion the official German replacement rule has always been and still is eszet->SS, even though an uppercase eszet has been defined for unicode (U+1E9E) a few years ago. I have never seen it in anywhere in the wild yet!

  • No such changes and replacements have been made or have been necessary for the three umlaute äöü which have always had proper uppercase versions ÄÖÜ unless you don't have them. Replacing them by ae,oe,ue is only a workaround, though, hardly better than replacing a eszet by a beta or a 'B'..

So the different comparison results make at least some sense, although treatment, especially wrt sorting is not really reliably uniform in Germany between, say dictionaries or phone books, lists, indices etc..

Upvotes: 2

xanatos
xanatos

Reputation: 111940

If you look at the Ä page, you'll see that not always Ä is a replacement for Æ (or ae), and it is still used in various languages.

The letter ß instead:

While the letter "ß" has been used in other languages, it is now only used in German. However, it is not used in Switzerland, Liechtenstein or Namibia.[1] German speakers in Germany, Austria, Belgium,[2] Denmark,[3] Luxembourg[4] and South Tyrol, Italy[5] follow the standard rules for ß.

So the ß is used in a single language, with a single rule (ß == ss), while the Ä is used in multiple languages with multiple rules.

Note that, considering that case folding is:

Case folding is primarily used for caseless comparison of text, such as identifiers in a computer program, rather than actual text transformation

The official Unicode 7.0 Case Folding Properties tells us that

00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S

where 00DF is ß and 0073 is s, so ß can be considered, for caseless comparison, as ss.

Upvotes: 29

DrKoch
DrKoch

Reputation: 9782

In German the ß character (which exists in lower case only) sounds like ss. Its usage changes from time to time and many people confuse ß and ss. If we write a word like Fuß (foot) in all capital we'd write FUSS. If a keyboard or a font does not support ß we write ss and it is (nearly, mostly) correct.

This may explain why ß and ss are handeled as equivalent if it comes to sorting.

Upvotes: -1

BLoB
BLoB

Reputation: 9725

Some background info for you. Taken from here.

Windows Alt Codes

In Windows, combinations of the ALT key plus a numeric code can be used to type a non-English character (accented letter or punctuation symbol) in any Windows application. More detailed instructions about typing accents with ALT keys are available. Additional options for entering accents in Windows are also listed in the Accents section of this Web site.

Note: The letters ü, ö, ä and ß can be replaced by "ue", "oe", "ae" or "ss" respectively.

German ALT Codes

Sym Windows ALT Code

Ä   ALT+0196
ä   ALT+0228
Ö   ALT+0214
ö   ALT+0246
Ü   ALT+0220
ü   ALT+0252
ß   ALT+0223
€   ALT+0128

Taken from here.

In the German alphabet, the letter ß, called "Eszett" (IPA: [ɛsˈtsɛt]) or "scharfes S", in English "sharp S", is a consonant that evolved as a ligature of "long s and z" (ſz) and "long s over round s" (ſs). When speaking it is pronounced [s] (see IPA). Since the German orthography reform of 1996, it is used only after long vowels and diphthongs, while ss is written after short vowels. The name eszett comes from the two letters S and Z as they are pronounced in German. It is also called scharfes S (IPA: [ˈʃaɐ̯.fəs ˈʔɛs, ˈʃaː.fəs ˈʔɛs] in German, meaning "sharp S". Its Unicode encoding is U+00DF.

Upvotes: 3

Richard
Richard

Reputation: 109190

Because that is how Germans define their own language. Or perhaps most accurately: how those defining sorting/collation for German have defined how Germans define the German language.

In much the way that English defies that the upper case of i is I but other languages using the Latin alphabet (eg. Turkish) disagree.

Upvotes: 1

Related Questions