أحمد صوالحة
أحمد صوالحة

Reputation: 323

read doc file very fast c#

I want to extract text from .doc files, I use this code

Microsoft.Office.Interop.Word.Application word = new  Microsoft.Office.Interop.Word.Application();
object miss = System.Reflection.Missing.Value;
object path = FileToSave_path + FileNameToSave + ".doc";
object readOnly = true;
Microsoft.Office.Interop.Word.Document docs = word.Documents.Open(ref path, ref miss, ref readOnly, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss, ref miss);
string totaltext = "";
for (int p = 0; p < docs.Paragraphs.Count; p++)
{
    ExtractedHTML += " \r\n " + docs.Paragraphs[p + 1].Range.Text.ToString();
}

docs.Close();
word.Quit();

the problem is that this code is very slow, I have many .doc files with many paragraphs any other way to extract from .doc fast ?

Upvotes: 1

Views: 4347

Answers (1)

Glorfindel
Glorfindel

Reputation: 22651

It is so slow because you need to 'start' Word every time (this happens underwater, but there are still some startup routines which it needs to perform). So it helps if you close only the document and not Word itself (with word.Quit();).

You can also look into third party libraries which can open .doc files without the help of Word. For .docx files, you can use Microsoft's own OpenXML SDK.

Upvotes: 2

Related Questions