Stefan Fabian
Stefan Fabian

Reputation: 510

C# Map all possible characters to the alphabet

I'm trying to map all possible letters to the letters A-Z, # for numbers and maybe & for other characters. For that I'm using the Normalize(NormalizationForm) method. That gets rid of most of the unwanted characters like characters with accents and so on.

However it doesn't deal with duplicates. It seems like the letter M is located in more than one position and therefore the equal-check fails.

Here's my code that checks every possible letter:

for (uint i = char.MinValue; i <= char.MaxValue; i++)
{
    char normalizedChar = char.ToUpper($"{(char)i}".Normalize(System.Text.NormalizationForm.FormKD).FirstOrDefault());
    if (Char.IsLetter((char)i) && !allowedLetters.Contains(normalizedChar))
        throw new Exception();
}

Where allowedLetters is a char array containing all letters of the alphabet and '#'.

It fails at i = 181 which is normalized 924 = 'M' just like 77 = 'M'.

I'm also open for better ways to normalize a character since the only method I could find only works on strings.

Upvotes: 1

Views: 2933

Answers (1)

Reacher Gilt
Reacher Gilt

Reputation: 1813

The NormalizationForm MSDN page explicitly warns about this:

Some Unicode sequences are considered equivalent because they represent the same character. (...) However, ordinal, that is, binary, comparisons consider these sequences different because they contain different Unicode code values. Before performing ordinal comparisons, applications must normalize these strings to decompose them into their basic components.

That note about "applications must normalize" means that you have some work to perform in your application. I suspect you're going to have to do some hand-mapping, ex map[char(924)] = char(77) or similar.

Upvotes: 1

Related Questions