Reputation: 209
Mainly looking for some pointers and a little bit of code. The task I have is to search through a number of files for different strings and create a log of the matches.
Initially I was parsing through each file looking for a single string but it was too slow once I had thousands of files around 1MB each. I therefore would like to try opening each file once and scan the file for multiple strings, attributing them in a log to the various rules.
I have created the following rules file:
{"Logs": {
"Component":
{
"Files":[
{
"name": "test.txt",
"encoding": "UTF8",
"rules":[{
"Rule1":"this is text"
}]
},
{
"name": "test2.txt",
"encoding": "UTF8",
"rules":[{
"Rule2": "this is text1",
"Rule3": "this is text3"
}]
}
]
}
}}
Maybe that needs to be improved and can be changed. The following Powershell uses the rule to go searching through files:
Function ParseFile($Files){
write-host "Parsing file" $Files.Name "for text " $Files.rules
Get-ChildItem "." -Recurse -Filter $Files.Name |
Foreach-Object {
write-host $_.FullName
Foreach($line in Get-Content $_.FullName -encoding $Files.encoding ) {
##Check if the current line from file matches a rule from the $Files.Rules array.
##If so log the file, line and rule ID to a CSV file. E.g.:
##RuleID, RuleString, LineFromFile, FileName
}
}
}
$JSON = Get-Content -Raw -Path rule.json | ConvertFrom-Json
foreach ($files in $JSON.Logs.Component.Files ){
write-host $files.name
write-host "============================="
ParseFile $files
}
Does the above make sense for the quickest way to search and classify? I'm not sure quite how to approach the commented section. I assume $line -in $Files.rules but I don't think the array is quite right for this.
Any suggestions welcome and thanks in advance.
Upvotes: 1
Views: 1016
Reputation: 54911
Here's an alternative using regex. I modified the JSON to make it easier to parse. The original JSON can work if needed by getting RuleID and RuleString using name and value properties in $_.rules.psobject.properties
.
This solution requires RuleID
to be single word.
rules.json
{"Logs": {
"Component":
{
"Files":[
{
"name": "test.txt",
"encoding": "UTF8",
"rules":[{
"RuleID": "Rule1",
"Rule": "this is text"
}]
},
{
"name": "test2.txt",
"encoding": "UTF8",
"rules":[
{
"RuleID": "Rule2",
"Rule": "this is text1"
},
{
"RuleID": "Rule3",
"Rule": "this is text3"
}
]
}
]
}
}}
Code:
$JSON.Logs.Component.Files | ForEach-Object {
$item = $_
#Create regex-pattern
$pattern = ($item.rules | ForEach-Object { "(?'$($_.RuleID)'$([regex]::Escape($_.Rule)))" }) -join '|'
#Find matching files
Get-ChildItem -Path "." -Recurse -Filter $item.Name |
Select-String -Pattern $pattern -Encoding $item.Encoding -AllMatches |
ForEach-Object {
$MatchedRule = $_.Matches.Groups | Where-Object { $_.Name -ne '0' -and $_.Success }
New-Object -TypeName psobject -Property @{
RuleID = $MatchedRule.Name
RuleString = $MatchedRule.Value
LineFromFile = $_.Line
FileName = $_.Path
}
}
} | Export-Csv -Path results.csv -NoTypeInformation -Encoding UTF8
results.csv:
"FileName","LineFromFile","RuleID","RuleString"
"D:\New folder\test.txt","foo this is text1 bar","Rule1","this is text"
"D:\New folder\test.txt","this is text3ss","Rule1","this is text"
"D:\New folder\test2.txt","foo this is text1 bar","Rule2","this is text1"
"D:\New folder\Test\test2.txt","this is text3ss","Rule3","this is text3"
Upvotes: 2
Reputation: 8442
I adjusted your JSON slightly:
{"Logs": {
"Component":
{
"Files":[
{
"name": "test.txt",
"encoding": "UTF8",
"rules":["this is text"
]
},
{
"name": "test2.txt",
"encoding": "UTF8",
"rules":["this is text1",
"this is text3"
]
}
]
}
}}
Using this, here is a possible solution:
$JSON = Get-Content -Raw -Path rules.json | ConvertFrom-Json
$JSON.Logs.Component.Files |
ForEach-Object {
$fileName = $_.Name
$rules = $_.rules
Get-Content $fileName -encoding $_.encoding |
ForEach-Object {
for($i=0;$i -lt $rules.Count;$i++)
{
if($_ -like "*$($rules[$i])*")
{
[PsCustomObject]@{RuleNumber = ($i+1);
RuleString = $rules[$i];
MatchingText = $_;
File = $filename} |
Export-Csv matches.csv -Append -NoTypeInformation
}
}
}
}
Upvotes: 1