Reputation: 16309
Wondering if there are good alternatives to this that perform no worse than what I have below? The real switch statement has additional sections for other non-English characters.
Note that I'd love to put multiple case statements per line, but StyleCop doesn't like it and will fail our release build as a result.
var retVal = String.Empty;
switch(valToCheck)
{
case "é":
case "ê":
case "è":
case "ë":
retVal = "e";
break;
case "à":
case "â":
case "ä":
case "å":
retVal = "a";
break;
default:
retVal = "-";
break;
}
Upvotes: 1
Views: 301
Reputation: 216343
The first thing that comes to mind is a Dictionary<char,char>()
(I prefer char instead of strings because you are dealing with chars)
Dictionary<char,char> dict = new Dictionary<char,char>();
dict.Add('å', 'a');
......
then you could remove your entire switch
char retValue;
char testValue = 'å';
if(dict.TryGetValue(testValue, out retValue) == false)
retVal = '-';
Upvotes: 4
Reputation: 27713
Use Contains
instead of switch
.
var retVal = String.Empty;
string es = "éêèë";
if (es.Contains(valToCheck)) retVal = "e";
//etc.
Upvotes: 0
Reputation: 35869
Based on Michael Kaplan's RemoveDiacritics(), you could do something like this:
static char RemoveDiacritics(char c)
{
string stFormD = c.ToString().Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();
for (int ich = 0; ich < stFormD.Length; ich++)
{
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
if (uc != UnicodeCategory.NonSpacingMark)
{
sb.Append(stFormD[ich]);
}
}
return (sb.ToString()[0]);
}
switch(RemoveDiacritics(valToCheck))
{
case 'e':
//...
break;
case 'a':
//...
break;
//...
}
or, potentially even:
retval = RemoveDiacritics(valToCheck);
Upvotes: 1
Reputation: 15803
This answer presumes that you are going to apply that switch statement to a string, not just to single characters (though that would also work).
The best approach seems to be the one outlined in this StackOverflow answer.
I adapted it to use LINQ:
var chars = from character in valToCheck.Normalize(NormalizationForm.FormD)
where CharUnicodeInfo.GetUnicodeCategory(character)
!= UnicodeCategory.NonSpacingMark
select character;
return string.Join("", chars).Normalize(NormalizationForm.FormC);
you'll need a using directive for System.Globalization;
Sample input:
string valToCheck = "êéÈöü";
Sample output:
eeEou
Upvotes: 1
Reputation: 18848
You could make a small range check and look at the ascii values.
Assuming InRange(val, min, max)
checks if a number is, yep, in range..
if(InRange(System.Convert.ToInt32(valToCheck),232,235))
return 'e';
else if(InRange(System.Convert.ToInt32(valToCheck),224,229))
return 'a';
This makes the code a little confusing, and depends on the standard used, but perhaps something to consider.
Upvotes: 1
Reputation: 203825
Well, start off by doing this transformation.
public class CharacterSanitizer
{
private static Dictionary<string, string> characterMappings = new Dictionary<string, string>();
static CharacterSanitizer()
{
characterMappings.Add("é", "e");
characterMappings.Add("ê", "e");
//...
}
public static string mapCharacter(string input)
{
string output;
if (characterMappings.TryGetValue(input, out output))
{
return output;
}
else
{
return input;
}
}
}
Now you're in the position where the character mappings are part of the data, rather than the code. I've hard coded the values here, but at this point it is simple enough to store the mappings in a file, read in the file and then populate the dictionary accordingly. This way you can not only clean up the code a lot by reducing the case statement to one bit text file (outside of code) but you can modify it without needing to re-compile.
Upvotes: 1