Fran Marzoa
Fran Marzoa

Reputation: 4546

C# string not supporting cyrillic chars

I am driving nuts with C# encoding, trying to store cyrillic characters in a string, and so far I haven't found a solution.

For example, if I execute the following code:

string test = "АЗУОЫЯЕЁЮИ";

The test variable will contain two question marks for each character instead the character itself.

It seems it is using ASCII for encoding, but I thought in C# all strings were UTF8 by default, but if it is using ASCII instead I didn't find a way to change it, so I don't know what to do.

I am using the Mono Develop that comes in the bundle within the Unity game engine, under OSX Yosemite. I DO save such files as UTF8 and I have double-checked it with iconv, just in case Mono Develop wasn't doing it right. They are UTF8 without doubt at all.

I have took a look on C# documentation about encoding, but I am afraid I haven't understood it very well, since I didn't find anything that could help me with this problem.

EDIT: I am adding this code, because it shows the problem is not just a matter of what you see, but something about internal encoding itself. (BTW, that "А" character is not an ASCII "A" but a Russian cyrillic "А"):

            // Debug code
            string one = "А";
            string two = "А";
            string three = "З";         
            string logMessageOne = (one == two) ? "One is equal to Two" : "One is different than Two";
            string logMessageTwo = (one == three) ? "One is equal to Three" : "One is different than Three";
            string logMessageThree = (one.CompareTo (three) == 0) ? "One is equal to Three" : "One is different than Three";

In all cases it says that all strings are equal.

Upvotes: 1

Views: 4339

Answers (3)

Fran Marzoa
Fran Marzoa

Reputation: 4546

OK, I finally managed to figure out the problem and solve it. It is clearly another bug more in Unity editor: it does not only want UTF-8 files, but they MUST have the BOM, despite such bytes are optional according to UTF-8 specification. To make things worse, the Mono Develop environment distributed with the same Unity game engine does NOT save UTF-8 with the BOM, so I finally ended up adding it manually just to try and it worked.

Just three steps in OSX command line:

cp KeyboardRussian.cs aux
echo -ne '\xEF\xBB\xBF' > KeyboardRussian.cs
cat aux >> KeyboardRussian.cs

And it worked like charm.

For the sake of credit, ChanibaL mentioned the BOM in his answer, though I didn't notice it.

In any case with this solution you don't need any additional tool in OSX, and for Windows probably you just need to make minor changes:

copy KeyboardRussian.cs aux
echo -ne '\xEF\xBB\xBF' > KeyboardRussian.cs
type aux >> KeyboardRussian.cs

Be awarer that I haven't tested that in Windows, despite it should work.

Upvotes: 1

Krzysztof Bociurko
Krzysztof Bociurko

Reputation: 4661

Every file with Unicode characters needs to be encoded as utf8 with bom to work in unity. By default, monodevelop does not do that (plain utf8), at least on osx.

On Windows, edit this file in notepad++ or similar and change encoding to utf8 with bom. If you're on osx, I can send you a tool for that.

If you add bom, it usually stays there, no need to repeat this every save.

Upvotes: 2

Mariachi
Mariachi

Reputation: 139

maybe you can use a dictionary, and then compare the strings:

        var map = new Dictionary<char, string>
            {
                {'а', "a"},
                {'б', "b"},
                {'в', "v"},
                {'г', "g"},
                {'д', "d"},
                {'е', "e"},
                {'ё', "yo"},
                {'ж', "zh"},
                {'з', "z"},
                {'и', "i"},
                {'й', "j"},
                {'к', "k"},
                {'л', "l"},
                {'м', "m"},
                {'н', "n"},
                {'о', "o"},
                {'п', "p"},
                {'р', "r"},
                {'с', "s"},
                {'т', "t"},
                {'у', "u"},
                {'ф', "f"},
                {'х', "h"},
                {'ц', "c"},
                {'ч', "ch"},
                {'ш', "sh"},
                {'щ', "sch"},
                {'ъ', "j"},
                {'ы', "i"},
                {'ь', "j"},
                {'э', "e"},
                {'ю', "yu"},
                {'я', "ya"},
                {'А', "A"},
                {'Б', "B"},
                {'В', "V"},
                {'Г', "G"},
                {'Д', "D"},
                {'Е', "E"},
                {'Ё', "Yo"},
                {'Ж', "Zh"},
                {'З', "Z"},
                {'И', "I"},
                {'Й', "J"},
                {'К', "K"},
                {'Л', "L"},
                {'М', "M"},
                {'Н', "N"},
                {'О', "O"},
                {'П', "P"},
                {'Р', "R"},
                {'С', "S"},
                {'Т', "T"},
                {'У', "U"},
                {'Ф', "F"},
                {'Х', "H"},
                {'Ц', "C"},
                {'Ч', "Ch"},
                {'Ш', "Sh"},
                {'Щ', "Sch"},
                {'Ъ', "J"},
                {'Ы', "I"},
                {'Ь', "J"},
                {'Э', "E"},
                {'Ю', "Yu"},
                {'Я', "Ya"}
            };
        var LatinText = string.Concat("АЗУОЫЯЕЁЮИ".Select(c => map[c]));
        Console.WriteLine(LatinText.ToString());

Hope this help.

Upvotes: 0

Related Questions