WolfyD
WolfyD

Reputation: 873

Finding words in an office word document

I'm working on a program that would classify files to groups based on certain text found in them. Most of the files are possibly going to be .doc or .docx.

My program should be able to compare a list of words with words in the files. I'm new to C# and i only study programming on my own, and the whole "read .doc file" thing goes way over my head, so any help would be greatly appreciated!

So far the part of my code that has to do with office is:

CODE

if (Path.GetExtension(listBox1.SelectedItem.ToString()) == ".doc" ||
    Path.GetExtension(listBox1.SelectedItem.ToString()) == ".docx")
{
    Microsoft.Office.Interop.Word.Document doc = 
        new Microsoft.Office.Interop.Word.Document(listBox1.SelectedItem.ToString());
    doc.Activate();
}

EDIT:

Sorry if the question wasn't clear enough. My question is:

How can i find, if the document contains any of the specific words contained in a text file. I have read many other questions, answers and tutorials and it might be just me but I totally don't get it.

Upvotes: 2

Views: 2580

Answers (2)

Anderson Rissardi
Anderson Rissardi

Reputation: 2547

you seem to be using Microsoft's interop classes so you can use the Outlook.Interop.Find

MSDN description and HOW TO

The execute method will return true if the document contains the word.

        StringBuilder sb = new StringBuilder();

        Word.Range rng = rodape.Range;
        Word.Find find = rng.Find;

        find.ClearFormatting();
        find.Replacement.ClearFormatting();//Only required if you will replace the text
        if (find.Execute("textToBeFound", false))
        {
            //The document contains the word

        }

Another example, from microsoft:

private void SelectionFind() { 

object findText = "find me";

Application.Selection.Find.ClearFormatting();

if (Application.Selection.Find.Execute(ref findText,
    ref missing, ref missing, ref missing, ref missing, ref missing, ref missing,
    ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, 
    ref missing, ref missing)) 
{ 
    MessageBox.Show("Text found.");
} 
else
{ 
    MessageBox.Show("The text could not be located.");
} }

But you have many other ways to do this..

Upvotes: 0

Splendor
Splendor

Reputation: 1396

Here is an introduction on reading text out of a .docx file: http://www.codeproject.com/Articles/20529/Using-DocxToText-to-Extract-Text-from-DOCX-Files

You could convert the .doc files to .docx files and use the same process for both.

Upvotes: 1

Related Questions