Encoding doesn't take affect Reading word document C#

Question

Trying to read unicode characters from a word document but getting symbols (????).

Here my code :

   Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
            object miss = System.Reflection.Missing.Value;
             object enc = Microsoft.Office.Core.MsoEncoding.msoEncodingEUCJapanese; 
            object path = @"C:\Users\file.doc"
            object readOnly = true;
            Microsoft.Office.Interop.Word.Document docs = word.Documents.Open(ref path, ref miss, ref readOnly, ref miss, ref miss,
                ref miss, ref miss, ref miss, ref miss, ref miss, ref enc, ref miss, ref miss, ref miss, ref miss, ref miss);
            string totaltext = "";
            for (int i = 0; i < docs.Paragraphs.Count; i++)
            {
                totaltext += " 
 " + docs.Paragraphs[i + 1].Range.Text.ToString();

                Console.WriteLine(totaltext);
            }
           // Console.WriteLine(totaltext);
            docs.Close();
            word.Quit();

Jon Skeet · Accepted Answer

Given the comments, it sounds like the problem may well just be with Console.WriteLine.

Try writing to a file instead:

// This will use Encoding.UTF8 by default.
using (var writer = File.CreateText("test.txt"))
{
    for (int i = 0; i < docs.Paragraphs.Count; i++)
    {
        writer.WriteLine(docs.Paragraphs[i + 1].Range.Text.ToString());
    }
}

Then open the file in Notepad, specifying UTF-8 as the encoding, and I suspect you'll see everything correctly.

Encoding doesn't take affect Reading word document C#

Answers (1)

Related Questions

Encoding doesn&#39;t take affect Reading word document C#

Answers (1)

Related Questions

Encoding doesn't take affect Reading word document C#