Reputation: 688
I am reading a word file using Microsoft.Office.Interop.Word.Document
library of visual studio.The problem is the file contain special characters like ρ,λ .And when i read in C# they are converted in ? Question mark .
For example i am reading a line like
A child drinks a liquid of density ρ through a vertical straw.
So this line is converted into A child drinks a liquid of density ? through a vertical straw.
So please help me how they are preserved in their original form.
Here is the code
public void ReadMsWord()
{
// variable to store file path
string filePath = null;
// open dialog box to select file
OpenFileDialog file = new OpenFileDialog();
// dilog box title name
file.Title = "Word File";
// set initial directory of computer system
file.InitialDirectory = "c:\\";
// set restore directory
file.RestoreDirectory = true;
// execute if block when dialog result box click ok button
if (file.ShowDialog() == DialogResult.OK)
{
// store selected file path
filePath = file.FileName.ToString();
}
Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.ApplicationClass();
// create object of missing value
object miss = System.Reflection.Missing.Value;
// create object of selected file path
object path = filePath;
// set file path mode
object readOnly = false;
// open document
Microsoft.Office.Interop.Word.Document docs = word.Documents.Open(ref path, ref
miss, ref readOnly, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss,
ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss);
try
{
// create word application
// select whole data from active window document
docs.ActiveWindow.Selection.WholeStory();
// handover the data to cllipboard
docs.ActiveWindow.Selection.Copy();
// clipboard create reference of idataobject interface which transfer the
data
IDataObject data = Clipboard.GetDataObject();
//set data into richtextbox control in text format
string t = "";
string[] y = {};
t = data.GetData(DataFormats.Text).ToString();
string[] options = { };
y = t.Split('\n');
}
catch(Exception ex)
{
throw ex;
}
}
Upvotes: 0
Views: 1606
Reputation: 2096
Use
t = data.GetData(DataFormats.UnicodeText).ToString();
i.e. UnicodeText
instead of Text
. Please note that special characters will still be displayed as ?
in a console window, but they are shown correctly in e.g. MessageBox.Show or the debugger.
Upvotes: 2