Reputation: 49
I need to read several word documents each with multiple pages using C# and office Interop libraries, check whether each page contains a specific word, if so add all the pages containing the same word from multiple documents and create a new document and save as a pdf. For an example, lets say you have multiple product details for each month in a word document. Such as details for Banana, Apples, Oranges etc. But a page will only have details about one product. So this program will process multiple documents of several months and create individual PDFs for each product. So end of the day you will have a PDF for Banana containing all the Banana details of several months, another PDF for Apples etc. I had a look at existing threads here and came up with a prototype as shown below. I still have few issues.
1. How to scan if a range has a certain word eg: Banana in our example
2. How to loop multiple pages in a document? I can extract a single range and create a PDF, but when I try to process multiple ranges, getting the page count it replaces the previous content when used word.Selection.Paste(); So my pdf ends up only with a single page.
3. How to make the pdf Landscape and A5 and there will be a page break after each page.
My Sample program
// Create a new Microsoft Word application object Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application(); word.Visible = true; // C# doesn't have optional arguments so we'll need a dummy value object oMissing = System.Reflection.Missing.Value; // Get list of Word files in specified directory DirectoryInfo dirInfo = new DirectoryInfo(@"C:\temp"); FileInfo[] wordFiles = dirInfo.GetFiles("*.doc"); //word.Visible = false; //word.ScreenUpdating = false; foreach (FileInfo wordFile in wordFiles) { if (!wordFile.FullName.Contains("$")) { // Cast as Object for word Open method Object filename = (Object)wordFile.FullName; // Use the dummy value as a placeholder for optional arguments Microsoft.Office.Interop.Word.Document doc = word.Documents.Open(ref filename, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing); //doc.Activate(); object what = WdGoToItem.wdGoToPage; object which = WdGoToDirection.wdGoToFirst; object count = 1; Range startRange = word.Selection.GoTo(ref what, ref which, ref count, ref oMissing); object count2 = (int)count + 1; Range endRange = word.Selection.GoTo(ref what, ref which, ref count2, ref oMissing); endRange.SetRange(startRange.Start, endRange.End - 1); endRange.Select(); word.Selection.Copy(); //word.Documents.Close(); //word.Quit(); word.Documents.Add(); word.Selection.Paste(); //Microsoft.Office.Interop.Word.Application word1 = new Microsoft.Office.Interop.Word.Application(); object outputFileName = wordFile.FullName.Replace(".docx", ".pdf"); object fileFormat = WdSaveFormat.wdFormatPDF; // Save document into PDF Format word.ActiveDocument.SaveAs(ref outputFileName, ref fileFormat, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing, ref oMissing); // Close the Word document, but leave the Word application open. // doc has to be cast to type _Document so that it will find the // correct Close method. object saveChanges = WdSaveOptions.wdDoNotSaveChanges; word.Documents.Close(ref saveChanges, ref oMissing, ref oMissing); doc = null; releaseObject(doc); } } // word has to be cast to type _Application so that it will find // the correct Quit method. ((_Application)word).Quit(ref oMissing, ref oMissing, ref oMissing); word = null; releaseObject(word);
Another attempt on trying to loop and checking whether the current page contains a specific text. But both does not work.
long pageCount = doc.ComputeStatistics(Microsoft.Office.Interop.Word.WdStatistic.wdStatisticPages);
object what = WdGoToItem.wdGoToPage;
object which = WdGoToDirection.wdGoToFirst;
object count = 0;
object count2 = (int)count + 1;
for (long i = 1; i < pageCount; i++)
{
count = (int)count + 1;
Range startRange = word.Selection.GoTo(ref what, ref which, ref count, ref oMissing);
count2 = (int)count + 1;
Range endRange = word.Selection.GoTo(ref what, ref which, ref count2, ref oMissing);
endRange.SetRange(startRange.Start, endRange.End - 1);
endRange.Select();
word.Selection.Copy();
word.Documents.Add();
word.Selection.Paste();
if (word.Selection.Find.Execute("Something"))
{
word.Selection.Copy();
word.Documents.Add();
word.Selection.Paste();
}
}
Appreciate any help on this. Thank you in advance.
Upvotes: 0
Views: 591
Reputation: 49
The problem is when you are looping and adding at the same time, you have to select your active document dynamically as the active document changes when you add a new document, so the program is trying to get a selection from the newly added document instead from the original document. The secret to have a page break is to put a section break to the next page. And you can change the page size, orientation etc in your selection using the PageSetup properties such as WdPaperSize etc. To find something in a range, you can do something like this.
word.Selection.Find.Execute("something");
Range range = word.Selection.Range;
if (range.Text.Contains("something"))
{
//Do your magic here
}
Upvotes: 0