Nimster
Nimster

Reputation: 31

searching for multiple strings in multiple files in PowerShell

first of all, I've got a reliable search (thanks to some help on Stack Overflow) that checks for occurrences of different strings in a line over many log files.

I've now been tasked to include multiple searches and since there are about 20 files and about a dozen search criteria, I don't want to to have to access these files over 200 times. I believe the best way of doing this is in a array, but so far all methods I've tried have failed.

The search criteria is made up of date, which obviously changes very day, a fixed string (ERROR) and a unique java classname. Here is what i have:

        $dateStr = Get-Date -Format "yyyy-MM-dd"
        $errword = 'ERROR'
        $word01 = [regex]::Escape('java.util.exception')   
    
        $pattern01 = "${dateStr}.+${errword}.+${word01}"
    
        $count01 = (Get-ChildItem -Filter $logdir -Recurse | Select-String -Pattern $pattern01 -AllMatches |ForEach-Object Matches |Measure-Object).Count
        Add-Content $outfile  "$dateStr,$word01,$count01"

the easy way to expand this is to have a separate three command entry (set word, set pattern and then search) for each class i want to search against - which I've done and it works, but its not elegant and then we're processing >200 files to run the search. I've tried to read the java classes in from a simple text file with mixed results, but its the only thing I've been able to get to work in order to simplify the search for 12 different patterns.

Upvotes: 1

Views: 724

Answers (1)

mklement0
mklement0

Reputation: 437198

iRon provided an important pointer: Select-String can accept an array of patterns to search for, and reports matches for lines that match any one of them.

You can then get away with a single Select-String call, combined with a Group-Object call that allows you to group all matching lines by which pattern matched:

# Create the input file with class names to search for.
@'
java.util.exception
java.util.exception2
'@ > classNames.txt

# Construct the array of search patterns,
# and add them to a map (hashtable) that maps each
# pattern to the original class name.
$dateStr = Get-Date -Format 'yyyy-MM-dd'
$patternMap = [ordered] @{}
Get-Content classNames.txt | ForEach-Object {
  $patternMap[('{0}.+{1}.+{2}' -f $dateStr, 'ERROR', [regex]::Escape($_))] = $_
}

# Search across all files, using multiple patterns.
Get-ChildItem -File -Recurse $logdir | Select-String @($patternMap.Keys) |
  # Group matches by the matching pattern.
  Group-Object Pattern |
    # Output the result; send to `Set-Content` as needed.
    ForEach-Object { '{0},{1},{2}' -f $dateStr, $patternMap[$_.Name], $_.Count }

Note:

  • $logDir, as the name suggests, is presumed to refer to a directory in which to (recursively) search for log files; passing that to -Filter wouldn't work, so I've removed it (which then positionally binds $logDir to the -Path parameter); -File limits the results to files; if other types of files are also present, add a -Filter argument as needed, e.g. -Filter *.log

  • Select-String's -AllMatches switch is generally not required - you only need it if any of the patterns can match multiple times per line and you want to capture all of those matches.

  • Using @(...), the array-subexpression operator around the collection of the hashtable's keys, $patternMap.Keys, i.e. the search patterns, is required purely for technical reasons: it forces the collection to be convertible to an array of strings ([string[]]), which is how the -Pattern parameter is typed.

    • The need for @(...) is surprising, and may be indicative of a bug, as of PowerShell 7.2; see GitHub issue #16061.

Upvotes: 1

Related Questions