Reputation: 307
I am processing some word files and now I would like to see if the file that is being processed contains something else then "shapes" In my case that would be plain text
I know how I can detect if the file contains shapes etc. But to see if a document contains text I am not really sure how I should do that
string path = "C:/Users/Test/Desktop/Test/";
foreach (string file in Directory.EnumerateFiles(path, "*.docx"))
{
var fileInfo = new FileInfo(file);
if (!fileInfo.Name.StartsWith("~$"))
{
var wordApplication = new Microsoft.Office.Interop.Word.Application();
var document = wordApplication.Documents.Open(file);
if (document.Content.Text.Contains(""))
{
Console.WriteLine(document.Name);
}
}
Maybe something like that so if the document does not contains anything ?
even when I enter a word file that has text and one that has no text both gets shown in the console
Upvotes: 2
Views: 2488
Reputation: 3851
You can count the number of words in the word document.
if (document.Words.Count <= 0)
{
Console.WriteLine(document.Name);
}
Upvotes: 2
Reputation: 15445
You can use the Open XML SDK from Microsoft to look for specific elements inside a Word Document. This does not require that Office is installed on the machine where your program is running.
For looking for shapes How to get list of shapes in SdtBlock element using Open XML SDK? gives a nice sample:
To give you an idea you can easily iterate through all elements like in this sample to decide whether the Word file is suitable for processing or not. Please note that this code is just sketching the idea.
var package = WordprocessingDocument.Open(wordFileStream, false);
OpenXmlElement element = package.MainDocumentPart.Document.Body;
foreach (OpenXmlElement section in element.Elements())
{
switch (section.LocalName)
{
// Text
case "t":
// we have found text
break;
case "cr": // Carriage return
case "br": // Page break
// we have found carriage return or page break
break;
case "p":
// we have found a paragraph
break;
default:
// we have found something else
break;
}
}
A reference for shapes is found here.
Upvotes: 1