Reputation: 148524
So I saw Jon's skeet video and there was a code sample :
There should have been a problem with the é
- after reversing but I guess it fails on .net2 (IMHO), anyway it did work for me and I did see the correct reversed string.
char[] a="Les Misérables".ToCharArray();
Array.Reverse(a);
string n= new string(a);
Console.WriteLine (n); //selbarésiM seL
But I took it further:
In Hebrew there is the "Alef" char : א
and I can add punctuation like : אֳ
( which I believe consists of 2 chars - yet displayed as one.)
But now look what happens :
char[] a="Les Misאֳrables".ToCharArray();
Array.Reverse(a);
string n= new string(a);
Console.WriteLine (n); //selbarֳאsiM seL
There was a split...
I can understand why it is happening :
Console.WriteLine ("אֳ".Length); //2
So I was wondering if there's a workaround for this kind of issue in C# ( or should I build my own mechanism....)
Upvotes: 26
Views: 4705
Reputation: 35716
If you made the extension
public static IEnumerable<string> ToTextElements(this string source)
{
var e = StringInfo.GetTextElementEnumerator(source)
while (e.MoveNext())
{
yield return e.GetTextElement();
}
}
you could do,
const string a = "AnyStringYouLike";
var aReversed = string.Concat(a.ToTextElements().Reverse());
Upvotes: 11
Reputation: 55389
The problem is that Array.Reverse
isn't aware that certain sequences of char
values may combine to form a single character, or "grapheme", and thus shouldn't be reversed. You have to use something that understands Unicode combining character sequences, like TextElementEnumerator:
// using System.Globalization;
TextElementEnumerator enumerator =
StringInfo.GetTextElementEnumerator("Les Misאֳrables");
List<string> elements = new List<string>();
while (enumerator.MoveNext())
elements.Add(enumerator.GetTextElement());
elements.Reverse();
string reversed = string.Concat(elements); // selbarאֳsiM seL
Upvotes: 41