Niko.K
Niko.K

Reputation: 307

How to check if the Word file contains text C#

I am processing some word files and now I would like to see if the file that is being processed contains something else then "shapes" In my case that would be plain text

I know how I can detect if the file contains shapes etc. But to see if a document contains text I am not really sure how I should do that

string path = "C:/Users/Test/Desktop/Test/";
foreach (string file in Directory.EnumerateFiles(path, "*.docx"))
{
   var fileInfo = new FileInfo(file);

   if (!fileInfo.Name.StartsWith("~$"))
   {
        var wordApplication = new Microsoft.Office.Interop.Word.Application();
        var document = wordApplication.Documents.Open(file);

        if (document.Content.Text.Contains(""))
        {
           Console.WriteLine(document.Name);
        }
   }

Maybe something like that so if the document does not contains anything ?

even when I enter a word file that has text and one that has no text both gets shown in the console

Upvotes: 2

Views: 2488

Answers (2)

Sandeep Kumar M
Sandeep Kumar M

Reputation: 3851

You can count the number of words in the word document.

if (document.Words.Count <= 0)
{
    Console.WriteLine(document.Name);
}

Upvotes: 2

Ralf B&#246;nning
Ralf B&#246;nning

Reputation: 15445

You can use the Open XML SDK from Microsoft to look for specific elements inside a Word Document. This does not require that Office is installed on the machine where your program is running.

For looking for shapes How to get list of shapes in SdtBlock element using Open XML SDK? gives a nice sample:

To give you an idea you can easily iterate through all elements like in this sample to decide whether the Word file is suitable for processing or not. Please note that this code is just sketching the idea.

        var package = WordprocessingDocument.Open(wordFileStream, false);
        OpenXmlElement element = package.MainDocumentPart.Document.Body;
        foreach (OpenXmlElement section in element.Elements())
        {
            switch (section.LocalName)
            {
                // Text 
                case "t":
                    // we have found text
                    break;
                case "cr":                          // Carriage return 
                case "br":                          // Page break 
                    // we have found carriage return or page break
                    break;
                case "p":
                    // we have found a paragraph
                    break;
                default:
                    // we have found something else
                    break;
            }
        }

A reference for shapes is found here.

Upvotes: 1

Related Questions