Hussein Zawawi
Hussein Zawawi

Reputation: 2927

Encoding doesn't take affect Reading word document C#

Trying to read unicode characters from a word document but getting symbols (????).

Here my code :

   Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
            object miss = System.Reflection.Missing.Value;
             object enc = Microsoft.Office.Core.MsoEncoding.msoEncodingEUCJapanese; 
            object path = @"C:\Users\file.doc"
            object readOnly = true;
            Microsoft.Office.Interop.Word.Document docs = word.Documents.Open(ref path, ref miss, ref readOnly, ref miss, ref miss,
                ref miss, ref miss, ref miss, ref miss, ref miss, ref enc, ref miss, ref miss, ref miss, ref miss, ref miss);
            string totaltext = "";
            for (int i = 0; i < docs.Paragraphs.Count; i++)
            {
                totaltext += " \r\n " + docs.Paragraphs[i + 1].Range.Text.ToString();

                Console.WriteLine(totaltext);
            }
           // Console.WriteLine(totaltext);
            docs.Close();
            word.Quit();

Upvotes: 1

Views: 1916

Answers (1)

Jon Skeet
Jon Skeet

Reputation: 1499770

Given the comments, it sounds like the problem may well just be with Console.WriteLine.

Try writing to a file instead:

// This will use Encoding.UTF8 by default.
using (var writer = File.CreateText("test.txt"))
{
    for (int i = 0; i < docs.Paragraphs.Count; i++)
    {
        writer.WriteLine(docs.Paragraphs[i + 1].Range.Text.ToString());
    }
}

Then open the file in Notepad, specifying UTF-8 as the encoding, and I suspect you'll see everything correctly.

Upvotes: 2

Related Questions