Reputation: 2927
Trying to read unicode characters from a word document but getting symbols (????).
Here my code :
Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
object miss = System.Reflection.Missing.Value;
object enc = Microsoft.Office.Core.MsoEncoding.msoEncodingEUCJapanese;
object path = @"C:\Users\file.doc"
object readOnly = true;
Microsoft.Office.Interop.Word.Document docs = word.Documents.Open(ref path, ref miss, ref readOnly, ref miss, ref miss,
ref miss, ref miss, ref miss, ref miss, ref miss, ref enc, ref miss, ref miss, ref miss, ref miss, ref miss);
string totaltext = "";
for (int i = 0; i < docs.Paragraphs.Count; i++)
{
totaltext += " \r\n " + docs.Paragraphs[i + 1].Range.Text.ToString();
Console.WriteLine(totaltext);
}
// Console.WriteLine(totaltext);
docs.Close();
word.Quit();
Upvotes: 1
Views: 1916
Reputation: 1499770
Given the comments, it sounds like the problem may well just be with Console.WriteLine
.
Try writing to a file instead:
// This will use Encoding.UTF8 by default.
using (var writer = File.CreateText("test.txt"))
{
for (int i = 0; i < docs.Paragraphs.Count; i++)
{
writer.WriteLine(docs.Paragraphs[i + 1].Range.Text.ToString());
}
}
Then open the file in Notepad, specifying UTF-8 as the encoding, and I suspect you'll see everything correctly.
Upvotes: 2