schnipdip
schnipdip

Reputation: 151

Recursively search a directory for files whose content matches a regex and collect the paths of matching files in an array

$locations = Get-ChildItem $readLoc -recurse | ? {!$_.psiscontainer} | select-object name | %{$e = $_.name; get-content $e}

$array = @()

for($i = 0; $i -lt $locations.length; $i++){
    #if($locations.name[$i].length -eq "9"){
        $paths = Resolve-Path $locations.fullname[$i]
        $paths.path
        get-content $locations.name[$i]
        #$array += $paths.path 
    #}
}

I need to iterate through each file in the file system and open each file. I am checking to see if a string within the file matches a regular expression and then output the full path to that file into an array.

However, $locations isn't accepting the get-content.

get-content : Cannot find path

'C:\Users\xxxxxx\Documents\files\powershell\OWASP_ApplicationThreatModeling.docx'
because it does not exist.
At line:1 char:89
+ ... .psiscontainer} | select-object name |%{$e = $_.name; get-content $e}
+                                                           ~~~~~~~~~~~~~~
    + CategoryInfo          : ObjectNotFound: (C:\Users\p61782...atModeling.docx:String) [Get-Content], ItemNotFoundEx
   ception
    + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.GetContentCommand.

Upvotes: 2

Views: 4117

Answers (2)

mklement0
mklement0

Reputation: 439767

As TheMadTechnician suggests, it's more efficient to use Select-String to perform the regex matching:

$locations = Get-ChildItem $readLoc -File -Recurse |
               Select-String -List -Pattern '^\d{3}-?\d{2}-?\d{4}$' | 
                 Select-Object -ExpandProperty Path

Note:
- The regex passed to -Pattern is a simplified version of the one linked to in a comment.
Note how the regex is enclosed in '...' rather than "..." so as to prevent inadvertent up-front interpretation of the string by PowerShell.

  • Get-ChildItem $readLoc -File -recurse recursively enumerates all files in the target directory's subtree. Switch -File (along with its counterpart, -Directory) is available in PSv3+ and makes your ? {!$_.psiscontainer} filter unnecessary.

  • Select-String can operate on the content of files piped via Get-ChildItem and performs regex matching by default:

    • -List tells Select-String to only return the first match from each input file (if any).
  • Select-String returns match-information objects whose .Path property contains the full path of the input file, so Select-Object -ExpandProperty Path is used to output just the path of any file that contains at least 1 match.

Overall, variable $locations therefore receives the array of full paths of those files in which least 1 line matches the regex of interest.
Note that PowerShell automatically collects output from a command in an array, if the output comprises more than 1 element.


As for what you tried:

  • Your immediate problem was that you passed .Name - i.e., a mere file name - to Get-Content rather than .FullName.

  • Furthermore, your apparent intent was to collect file-info objects in array $locations, whereas your pipeline actually produced the contents of all files (as an array of lines).

Upvotes: 3

TheMadTechnician
TheMadTechnician

Reputation: 36332

You need to work with the FullName property. Right now you're stripping that with your Select-Object command.

$locations = Get-ChildItem $readLoc -recurse | ? {!$_.psiscontainer}

for($i = 0; $i -lt $locations.length; $i++){
    $locations[$i].fullname
    get-content $locations[$i].fullname
}

Upvotes: 0

Related Questions