Nathan
Nathan

Reputation: 188

Writing, displaying, and storing Japanese characters in c#

I am working on a project which requires lots of japanese katakana, hiragana, and kanji characters. The original files are excel files using the "MS Pゴシック" font. The problem I am having seems to be the same as everyone else with this type of issue and c#. The solutions I have found all seem to start with adding the text within the c# program. What I am trying to do is read one of my .xls or .txt files that I have made into c#, work with the data using normal c# functions such as string compare. However, when I do this, noting happens. Writing or displaying the data produces "?" marks. Nothing new here.

I tried the same idea with c++ and it works perfectly.

The problem is it has to be c#, not c++ in order to work with the interops for the other software I am utilizing.

Long story short, do c#(system.string) not handle unicode natively compared to c++ (c string)?

I am using Visual Studio C++ 2008 Express and Visual Studio C# 2010 Express. Files are the same, but it works in c++ and not in c#.

Sorry, I haven't used english in a while. I have tried various types, the below is the latest but still "?" marks for output.

var reader = new StreamReader(File.OpenRead(@"C:\smallerBunShou.txt"), Encoding.UTF8);   
        while (!reader.EndOfStream)
        {
            var line = reader.ReadLine();
            var values = line.Split(',');

            listA.Add(values[0]);
           // listB.Add(values[1]);
           // listC.Add(values[2]);
        }

        int sizeOflistA = listA.Count();

        //using (System.IO.StreamWriter file = new System.IO.StreamWriter(@"C:\WriteLines2.txt"))
        var file = new StreamWriter(File.OpenWrite(@"C:\WriteLines2.txt"), Encoding.UTF8);
        {
            foreach (string line in listA)
            {
                // If the line doesn't contain the word 'Second', write the line to the file. 
                if (!line.Contains("Second"))
                {
                    file.WriteLine(line);
                }
            }
        }

I have also tried the Encoding.Unicode, etc. My computer is a japanese PC, software is mostly japanese. According to one of the answers so far, it is not a unicode issue, Japanese PCs use Shift-JIS which is most likely what I need to look into. When I solve this I will post my solution.

Update: After looking around a bit, I found the Shift-JIS encoding scheme.

Encoding.GetEncoding(932));

This solved my problem! Thank you @EricFalsken for pointing me in the right direction.

Upvotes: 0

Views: 8970

Answers (1)

Eric Falsken
Eric Falsken

Reputation: 4932

Normal .txt files are not saved in Unicode format. You're going to need to specify the byte format when reading the FileStream by running it through the TextReader and Encoding.Unicode.

But note that most Japanese computers and documents do NOT use Unicode. They still use Shift-JIS quite extensively.

I can assure you that all strings in C# support Unicode natively.

Upvotes: 4

Related Questions