L.Pump
L.Pump

Reputation: 3

Powershell won't read header text in word documents?

I am in need of checkingh a larger number of word documents (doc & docx) for a specific text and found a great tutorial and script by the Scripting Guys;

https://blogs.technet.microsoft.com/heyscriptingguy/2012/08/01/find-all-word-documents-that-contain-a-specific-phrase/

The script reads all documents in a directory and gives the following output;

  1. Number of times mentioned
  2. Total word count in all documents where the specific text is found
  3. The directory of all files containing the specific text.

This is all I need, however their code doesn't seem to actually check the headers of any document, which incidentally is where the specific text I'm looking for is located. Any tips & tricks in making the script read header text would make me very happy.

An alternative solution might be to remove the formatting so that the header text becomes part of the rest of the document? Is this possible?

Edit: Forgot to link the script:

[cmdletBinding()]
Param(
 $Path = "C:\Users\use\Desktop\"
) #end param

$matchCase = $false
$matchWholeWord = $true
$matchWildCards = $false
$matchSoundsLike = $false
$matchAllWordForms = $false
$forward = $true
$wrap = 1
$application = New-Object -comobject word.application
$application.visible = $False
$docs = Get-childitem -path $Path -Recurse -Include *.docx
$findText = "specific text"
$i = 1
$totalwords = 0
$totaldocs = 0

Foreach ($doc in $docs)
{
 Write-Progress -Activity "Processing files" -status "Processing $($doc.FullName)" -PercentComplete ($i /$docs.Count * 100) 
 $document = $application.documents.open($doc.FullName)
 $range = $document.content
 $null = $range.movestart()
 $wordFound = $range.find.execute($findText,$matchCase,
  $matchWholeWord,$matchWildCards,$matchSoundsLike,
  $matchAllWordForms,$forward,$wrap)
  if($wordFound) 
    { 
     $doc.fullname
     $document.Words.count
     $totaldocs ++
     $totalwords += $document.Words.count
    } #end if $wordFound
 $document.close()
 $i++
} #end foreach $doc
$application.quit()
"There are $totaldocs and $($totalwords.tostring('N')) words"

#clean up stuff
[System.Runtime.InteropServices.Marshal]::ReleaseComObject($range) | Out-Null
[System.Runtime.InteropServices.Marshal]::ReleaseComObject($document) | Out-Null
[System.Runtime.InteropServices.Marshal]::ReleaseComObject($application) | Out-Null
Remove-Variable -Name application
[gc]::collect()
[gc]::WaitForPendingFinalizers()

EDIT 2: My colleague got the idea to call on the section header instead;

Foreach ($doc in $docs)
{
 Write-Progress -Activity "Processing files" -status "Processing $($doc.FullName)" -PercentComplete ($i /$docs.Count * 100) 
 $document = $application.documents.open($doc.FullName)
 # Load first section of the document
 $section = $doc.sections.item(1);
 # Load header
 $header = $section.headers.Item(1);

 # Set the range to be searched to only Header
 $range = $header.content
 $null = $range.movestart()

 $wordFound = $range.find.execute($findText,$matchCase,
  $matchWholeWord,$matchWildCards,$matchSoundsLike,
  $matchAllWordForms,$forward,$wrap,$Format)
  if($wordFound) [script continues as above]

But this is met with the following errors:

You cannot call a method on a null-valued expression.
At C:\Users\user\Desktop\count_mod.ps1:27 char:31
+  $section = $doc.sections.item <<<< (1);
    + CategoryInfo          : InvalidOperation: (item:String) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

You cannot call a method on a null-valued expression.
At C:\Users\user\Desktop\count_mod.ps1:29 char:33
+  $header = $section.headers.Item <<<< (1);
    + CategoryInfo          : InvalidOperation: (Item:String) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

You cannot call a method on a null-valued expression.
At C:\Users\user\Desktop\count_mod.ps1:33 char:26
+  $null = $range.movestart <<<< ()
    + CategoryInfo          : InvalidOperation: (movestart:String) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

You cannot call a method on a null-valued expression.
At C:\Users\user\Desktop\count_mod.ps1:35 char:34
+  $wordFound = $range.find.execute <<<< ($findText,$matchCase,
    + CategoryInfo          : InvalidOperation: (execute:String) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

Is this the right way to go or is it a dead end?

Upvotes: 0

Views: 1780

Answers (2)

L.Pump
L.Pump

Reputation: 3

For anyone looking at this question in the future: Something isn't quite working with my code above. It seems to return a false positive and puts $wordFound = 1 regardless of the content of the document thus listing all documents found under $path.

Editing the variables within Find.Execute doesn't seem to change the outcome of $wordFound. I believe the problem might be found in my $range, as it is the only place I get errors in while going through the code step by step.

Errors listed;

You cannot call a method on a null-valued expression.
At C:\Users\user\Desktop\Powershell\count.ps1:24 char:58
+  $range = $document.content.Structures.First.Headers.Item <<<< (1).range.Text
    + CategoryInfo          : InvalidOperation: (Item:String) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

Exception calling "MoveStart" with "0" argument(s): "The RPC server is unavailable. (Exception from HRESULT: 0x800706BA)"
At C:\Users\user\Desktop\Powershell\count.ps1:25 char:26
+  $null = $range.MoveStart <<<< ()
    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : ComMethodCOMException

You cannot call a method on a null-valued expression.
At C:\Users\user\Desktop\Powershell\count.ps1:26 char:34
+  $wordFound = $range.Find.Execute <<<< ($findText,$matchCase,
    + CategoryInfo          : InvalidOperation: (Execute:String) [], RuntimeException
    + FullyQualifiedErrorId : InvokeMethodOnNull

Upvotes: 0

Micky Balladelli
Micky Balladelli

Reputation: 9991

if you want the header text, you can try the following:

$document.content.Sections.First.Headers.Item(1).range.text

Upvotes: 1

Related Questions