Reputation: 31
I have a group of .txt files that contain one or two of the following strings.
"red", "blue", "green", "orange", "purple", ....
many more (50+) possibilities in the list.
If it helps, I can tell if the .txt file contains one or two items, but don't know which one/ones they are. The string patterns are always on their own line.
I'd like the script to tell me specifically which one or two string matches (from the master list) it found, and the order in which it found them. (Which one was first)
Since I have a lot of text files to search, I'd like to write the output results to a CSV file as I search.
FILENAME1,first_match,second_match
file1.txt,blue,red
file2.txt,red, blue
file3.txt,orange,
file4.txt,purple,red
file5.txt,purple,
...
I've tried using many individual Select-Strings
returning Boolean results to set variables with any matches found, but with the number of possible strings it gets ugly real fast. My search results for this issue has provided me with no new ideas to try. (I'm sure I'm not asking in the correct way)
Do I need to loop through each line of text in each file?
Am I stuck with the process of elimination method by checking for the existence of each search string?
I'm looking for a more elegant approach to this problem. (if one exists)
Upvotes: 2
Views: 2104
Reputation: 58491
Not very intuïtive but elegant...
$regex = "(purple|blue|red)"
Get-ChildItem $env:TEMP\test\*.txt | Foreach-Object{
$result = $_.FullName
switch -Regex -File $_
{
$regex {$result = "$($result),$($matches[1])"}
}
$result
}
C:\Users\Lieven Keersmaekers\AppData\Local\Temp\test\file1.txt,blue,red
C:\Users\Lieven Keersmaekers\AppData\Local\Temp\test\file2.txt,red,blue
file1
contains first blue
, then red
file2
contains first red
, then blue
Upvotes: 4
Reputation: 54981
You can use regex to search to get index (startpos. in line) combine with Select-String
which returns linenumber and you're good to go.
Select-String
supports an array as value for -Pattern
, but unfortunately it stops on a line after first match even when you use -AllMatches
(bug?). Because of this we have to search one time per word/pattern. Try:
#List of words. Had to escape them because Select-String doesn't return Matches-objects (with Index/location) for SimpleMatch
$words = "purple","blue","red" | ForEach-Object { [regex]::Escape($_) }
#Can also use a list with word/sentence per line using $words = Get-Content patterns.txt | % { [regex]::Escape($_.Trim()) }
#Get all files to search
Get-ChildItem -Filter "test.txt" -Recurse | Foreach-Object {
#Has to loop words because Select-String -Pattern "blue","red" won't return match for both pattern. It stops on a line after first match
foreach ($word in $words) {
$_ | Select-String -Pattern $word |
#Select the properties we care about
Select-Object Path, Line, Pattern, LineNumber, @{n="Index";e={$_.Matches[0].Index}}
}
} |
#Sort by File (to keep file-matches together), then LineNumber and Index to get the order of matches
Sort-Object Path, LineNumber, Index |
Export-Csv -NoTypeInformation -Path Results.csv -Encoding UTF8
Results.csv
"Path","Line","Pattern","LineNumber","Index"
"C:\Users\frode\Downloads\test.txt","file1.txt,blue,red","blue","3","10"
"C:\Users\frode\Downloads\test.txt","file1.txt,blue,red","red","3","15"
"C:\Users\frode\Downloads\test.txt","file2.txt,red, blue","red","4","10"
"C:\Users\frode\Downloads\test.txt","file2.txt,red, blue","blue","4","15"
"C:\Users\frode\Downloads\test.txt","file4.txt,purple,red","purple","6","10"
"C:\Users\frode\Downloads\test.txt","file4.txt,purple,red","red","6","17"
"C:\Users\frode\Downloads\test.txt","file5.txt,purple,","purple","7","10"
Upvotes: 1