Alexander Galkin
Alexander Galkin

Reputation: 12554

How to check if Unicode character has diacritics in .Net?

I am developing a heuristic for automatic language detection and would like to find out whether the given letter has diacritics (like "Ðàäèî Êóëüòóðà" -- all letters have diacritics). It would be best if I could also get the type of diacritic, if possible.

I browsed through UnicodeCategory enum but didn't find anything that could help me here.

Upvotes: 7

Views: 8173

Answers (2)

Ashish Thakur
Ashish Thakur

Reputation: 323

Try this:


  public bool CheckIsStringContainDiacriticsCharacter(string text)
        {
            bool IsDiacriticsCharacter = false;

            var normalizedString = text.Normalize(NormalizationForm.FormD);
            var stringBuilder = new StringBuilder();
            foreach (var c in normalizedString)
            {
                var unicodeCategory = CharUnicodeInfo.GetUnicodeCategory(c);
                if (unicodeCategory != UnicodeCategory.NonSpacingMark)
                {
                    stringBuilder.Append(c);
                }
                else
                {
                    IsDiacriticsCharacter = true;
                    break;
                }
            }
      
            return IsDiacriticsCharacter;
        }

Upvotes: 1

CodesInChaos
CodesInChaos

Reputation: 108880

One possible way is to normalize it to a form where letters and their diacritics are written as several codepoints. Then check if you have a letter followed by accents.

Adapting from How do I remove diacritics (accents) from a string in .NET?, you can normalize with Normalize(NormalizationForm.FormD) and check for the diacritics with UnicodeCategory.NonSpacingMark.

bool IsLetterWithDiacritics(char c)
{
    var s = c.ToString().Normalize(NormalizationForm.FormD);
    return (s.Length > 1)  &&
           char.IsLetter(s[0]) &&
           s.Skip(1).All(c2 => CharUnicodeInfo.GetUnicodeCategory(c2) == UnicodeCategory.NonSpacingMark);
}

Upvotes: 16

Related Questions