Reputation: 3
I am in need of checkingh a larger number of word documents (doc & docx) for a specific text and found a great tutorial and script by the Scripting Guys;
The script reads all documents in a directory and gives the following output;
This is all I need, however their code doesn't seem to actually check the headers of any document, which incidentally is where the specific text I'm looking for is located. Any tips & tricks in making the script read header text would make me very happy.
An alternative solution might be to remove the formatting so that the header text becomes part of the rest of the document? Is this possible?
Edit: Forgot to link the script:
[cmdletBinding()]
Param(
$Path = "C:\Users\use\Desktop\"
) #end param
$matchCase = $false
$matchWholeWord = $true
$matchWildCards = $false
$matchSoundsLike = $false
$matchAllWordForms = $false
$forward = $true
$wrap = 1
$application = New-Object -comobject word.application
$application.visible = $False
$docs = Get-childitem -path $Path -Recurse -Include *.docx
$findText = "specific text"
$i = 1
$totalwords = 0
$totaldocs = 0
Foreach ($doc in $docs)
{
Write-Progress -Activity "Processing files" -status "Processing $($doc.FullName)" -PercentComplete ($i /$docs.Count * 100)
$document = $application.documents.open($doc.FullName)
$range = $document.content
$null = $range.movestart()
$wordFound = $range.find.execute($findText,$matchCase,
$matchWholeWord,$matchWildCards,$matchSoundsLike,
$matchAllWordForms,$forward,$wrap)
if($wordFound)
{
$doc.fullname
$document.Words.count
$totaldocs ++
$totalwords += $document.Words.count
} #end if $wordFound
$document.close()
$i++
} #end foreach $doc
$application.quit()
"There are $totaldocs and $($totalwords.tostring('N')) words"
#clean up stuff
[System.Runtime.InteropServices.Marshal]::ReleaseComObject($range) | Out-Null
[System.Runtime.InteropServices.Marshal]::ReleaseComObject($document) | Out-Null
[System.Runtime.InteropServices.Marshal]::ReleaseComObject($application) | Out-Null
Remove-Variable -Name application
[gc]::collect()
[gc]::WaitForPendingFinalizers()
EDIT 2: My colleague got the idea to call on the section header instead;
Foreach ($doc in $docs)
{
Write-Progress -Activity "Processing files" -status "Processing $($doc.FullName)" -PercentComplete ($i /$docs.Count * 100)
$document = $application.documents.open($doc.FullName)
# Load first section of the document
$section = $doc.sections.item(1);
# Load header
$header = $section.headers.Item(1);
# Set the range to be searched to only Header
$range = $header.content
$null = $range.movestart()
$wordFound = $range.find.execute($findText,$matchCase,
$matchWholeWord,$matchWildCards,$matchSoundsLike,
$matchAllWordForms,$forward,$wrap,$Format)
if($wordFound) [script continues as above]
But this is met with the following errors:
You cannot call a method on a null-valued expression.
At C:\Users\user\Desktop\count_mod.ps1:27 char:31
+ $section = $doc.sections.item <<<< (1);
+ CategoryInfo : InvalidOperation: (item:String) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
You cannot call a method on a null-valued expression.
At C:\Users\user\Desktop\count_mod.ps1:29 char:33
+ $header = $section.headers.Item <<<< (1);
+ CategoryInfo : InvalidOperation: (Item:String) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
You cannot call a method on a null-valued expression.
At C:\Users\user\Desktop\count_mod.ps1:33 char:26
+ $null = $range.movestart <<<< ()
+ CategoryInfo : InvalidOperation: (movestart:String) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
You cannot call a method on a null-valued expression.
At C:\Users\user\Desktop\count_mod.ps1:35 char:34
+ $wordFound = $range.find.execute <<<< ($findText,$matchCase,
+ CategoryInfo : InvalidOperation: (execute:String) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
Is this the right way to go or is it a dead end?
Upvotes: 0
Views: 1780
Reputation: 3
For anyone looking at this question in the future: Something isn't quite working with my code above. It seems to return a false positive and puts $wordFound = 1 regardless of the content of the document thus listing all documents found under $path.
Editing the variables within Find.Execute doesn't seem to change the outcome of $wordFound. I believe the problem might be found in my $range, as it is the only place I get errors in while going through the code step by step.
Errors listed;
You cannot call a method on a null-valued expression.
At C:\Users\user\Desktop\Powershell\count.ps1:24 char:58
+ $range = $document.content.Structures.First.Headers.Item <<<< (1).range.Text
+ CategoryInfo : InvalidOperation: (Item:String) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
Exception calling "MoveStart" with "0" argument(s): "The RPC server is unavailable. (Exception from HRESULT: 0x800706BA)"
At C:\Users\user\Desktop\Powershell\count.ps1:25 char:26
+ $null = $range.MoveStart <<<< ()
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : ComMethodCOMException
You cannot call a method on a null-valued expression.
At C:\Users\user\Desktop\Powershell\count.ps1:26 char:34
+ $wordFound = $range.Find.Execute <<<< ($findText,$matchCase,
+ CategoryInfo : InvalidOperation: (Execute:String) [], RuntimeException
+ FullyQualifiedErrorId : InvokeMethodOnNull
Upvotes: 0
Reputation: 9991
if you want the header text, you can try the following:
$document.content.Sections.First.Headers.Item(1).range.text
Upvotes: 1