Reputation: 873
I'm working on a program that would classify files to groups based on certain text found in them. Most of the files are possibly going to be .doc or .docx.
My program should be able to compare a list of words with words in the files. I'm new to C# and i only study programming on my own, and the whole "read .doc file" thing goes way over my head, so any help would be greatly appreciated!
So far the part of my code that has to do with office is:
CODE
if (Path.GetExtension(listBox1.SelectedItem.ToString()) == ".doc" ||
Path.GetExtension(listBox1.SelectedItem.ToString()) == ".docx")
{
Microsoft.Office.Interop.Word.Document doc =
new Microsoft.Office.Interop.Word.Document(listBox1.SelectedItem.ToString());
doc.Activate();
}
EDIT:
Sorry if the question wasn't clear enough. My question is:
How can i find, if the document contains any of the specific words contained in a text file. I have read many other questions, answers and tutorials and it might be just me but I totally don't get it.
Upvotes: 2
Views: 2580
Reputation: 2547
you seem to be using Microsoft's interop classes so you can use the Outlook.Interop.Find
The execute method will return true if the document contains the word.
StringBuilder sb = new StringBuilder();
Word.Range rng = rodape.Range;
Word.Find find = rng.Find;
find.ClearFormatting();
find.Replacement.ClearFormatting();//Only required if you will replace the text
if (find.Execute("textToBeFound", false))
{
//The document contains the word
}
Another example, from microsoft:
private void SelectionFind() {
object findText = "find me";
Application.Selection.Find.ClearFormatting();
if (Application.Selection.Find.Execute(ref findText,
ref missing, ref missing, ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing, ref missing, ref missing, ref missing, ref missing,
ref missing, ref missing))
{
MessageBox.Show("Text found.");
}
else
{
MessageBox.Show("The text could not be located.");
} }
But you have many other ways to do this..
Upvotes: 0
Reputation: 1396
Here is an introduction on reading text out of a .docx file: http://www.codeproject.com/Articles/20529/Using-DocxToText-to-Extract-Text-from-DOCX-Files
You could convert the .doc files to .docx files and use the same process for both.
Upvotes: 1