PS Newb
PS Newb

Reputation: 31

Powershell text search - multiple matches

I have a group of .txt files that contain one or two of the following strings.

"red", "blue", "green", "orange", "purple", .... many more (50+) possibilities in the list.

If it helps, I can tell if the .txt file contains one or two items, but don't know which one/ones they are. The string patterns are always on their own line.

I'd like the script to tell me specifically which one or two string matches (from the master list) it found, and the order in which it found them. (Which one was first)

Since I have a lot of text files to search, I'd like to write the output results to a CSV file as I search.

FILENAME1,first_match,second_match

file1.txt,blue,red
file2.txt,red, blue
file3.txt,orange,
file4.txt,purple,red
file5.txt,purple,
...

I've tried using many individual Select-Strings returning Boolean results to set variables with any matches found, but with the number of possible strings it gets ugly real fast. My search results for this issue has provided me with no new ideas to try. (I'm sure I'm not asking in the correct way)

Do I need to loop through each line of text in each file?

Am I stuck with the process of elimination method by checking for the existence of each search string?

I'm looking for a more elegant approach to this problem. (if one exists)

Upvotes: 2

Views: 2104

Answers (2)

Lieven Keersmaekers
Lieven Keersmaekers

Reputation: 58491

Not very intuïtive but elegant...

Following switch statement

$regex = "(purple|blue|red)"

Get-ChildItem $env:TEMP\test\*.txt | Foreach-Object{
    $result = $_.FullName
    switch -Regex -File $_
    {
        $regex {$result = "$($result),$($matches[1])"}
    }
    $result
}

returns

C:\Users\Lieven Keersmaekers\AppData\Local\Temp\test\file1.txt,blue,red
C:\Users\Lieven Keersmaekers\AppData\Local\Temp\test\file2.txt,red,blue

where

  • file1 contains first blue, then red
  • file2 contains first red, then blue

Upvotes: 4

Frode F.
Frode F.

Reputation: 54981

You can use regex to search to get index (startpos. in line) combine with Select-String which returns linenumber and you're good to go.

Select-String supports an array as value for -Pattern, but unfortunately it stops on a line after first match even when you use -AllMatches (bug?). Because of this we have to search one time per word/pattern. Try:

#List of words. Had to escape them because Select-String doesn't return Matches-objects (with Index/location) for SimpleMatch
$words = "purple","blue","red" | ForEach-Object { [regex]::Escape($_) }
#Can also use a list with word/sentence per line using $words = Get-Content patterns.txt | % { [regex]::Escape($_.Trim()) }

#Get all files to search
Get-ChildItem -Filter "test.txt" -Recurse | Foreach-Object { 
    #Has to loop words because Select-String -Pattern "blue","red" won't return match for both pattern. It stops on a line after first match
    foreach ($word in $words) {
        $_ | Select-String -Pattern $word |
        #Select the properties we care about
        Select-Object Path, Line, Pattern, LineNumber, @{n="Index";e={$_.Matches[0].Index}}
    }
} |
#Sort by File (to keep file-matches together), then LineNumber and Index to get the order of matches
Sort-Object Path, LineNumber, Index |
Export-Csv -NoTypeInformation -Path Results.csv -Encoding UTF8

Results.csv

"Path","Line","Pattern","LineNumber","Index"
"C:\Users\frode\Downloads\test.txt","file1.txt,blue,red","blue","3","10"
"C:\Users\frode\Downloads\test.txt","file1.txt,blue,red","red","3","15"
"C:\Users\frode\Downloads\test.txt","file2.txt,red, blue","red","4","10"
"C:\Users\frode\Downloads\test.txt","file2.txt,red, blue","blue","4","15"
"C:\Users\frode\Downloads\test.txt","file4.txt,purple,red","purple","6","10"
"C:\Users\frode\Downloads\test.txt","file4.txt,purple,red","red","6","17"
"C:\Users\frode\Downloads\test.txt","file5.txt,purple,","purple","7","10"

Upvotes: 1

Related Questions