Reputation: 91
We're working with a text file that contains many different types of reports. Some of those reports need to either have some words changed or just copy them over exactly as they are.
The file has to stay a single text file, so the idea is to move through the file, comparing the lines. If a line is found that is a "ReportType1", then we need to change some wording, so we go into an inner loop, extracting the data and changing words as we go. The loop ends when it reaches a footer in the report and should move on to the next report.
We've tried -match, -like, -contains, -eq, but it never works quite like it's supposed to. We either get data that's been changed/reformatted that shouldn't be or we're only getting the header data.
Add-Type -AssemblyName System.Collections
Add-Type -AssemblyName System.Text.RegularExpressions
[System.Collections.Generic.List[string]]$content = @()
$inputFile = "drive\folder\inputfile.txt"
$outputFile = "drive\folder\outputfile.txt"
#This will retrieve the total number of lines in the file
$FileContent = Get-Content $inputFile
$FileLineCount = $FileContent | Measure-Object -Line
$TotalLines = $FileContent.Count
$TotalLines++ #Need to increase by one; the last line is blank
$startLine = 0
$lineCounter = 0
#Start reading the file; this is the Header section
#Number of lines may vary, but data is copied over word
#for word
foreach($line in Get-Content $inputfile)
{
$startLine++
If($line -match "FOOTER")
{
[void]$content.Add( $line )
break
}
else
{
[void]$content.Add( $line )
}
}
## ^^This section works perfectly
#Start reading the body of the file
Do {
#Start reading from the current position
#This should change with each report read
$line = Get-Content $inputFile | select -Skip $startLine
If($line -match "ReportType1") #If it's a ReportType1, some wording needs to be changed
{
#Start reading the file from the current position
#Should loop through this record only
foreach($line in Get-Content $inputFile | select -skip $startline)
{
If($line -match "FOOTER") #End of the current record
{
[void]$content.Add( $line )
break #break out of the loop and continue reading the file from the new current position
}
elseif ($line -match "OldWord") #Have to replace a word on some lines
{
$line = $line.Replace("OldWord","NewWord")
[void]$content.Add( $line )
}
else
{
[void]$content.Add( $line )
}
$startline++
}
}
else
{
If($line -match "ReportType2") #ReportType2 can just be copied over line for line
{
#Start reading the file from the current position
#Should loop through this record only
foreach($line in Get-Content $inputFile | select -skip $startline)
{
If($line -match "FOOTER") #End of the current record
{
[void]$content.Add( $line )
break #break out of the loop and continue reading the file from the new current position
}
else
{
[void]$content.Add( $line )
}
$startline++
}
}
$startline++
} until ($startline -eq $TotalLines)
[System.IO.File]::WriteAllLines( $outputFile, $content ) | Out-Null
It sort of works, but we're getting some unexpected behavior. The reports look fine and all, but it's changing words in "ReportType2", even though the code isn't set up to do that. It's like it's only going through the first IF statement. But how can it be if the lines don't match up?
We know the $startline variable is increasing through the iterations, so it's not like it's stuck on one line. However, doing 'Write-Host' shows $line is always "ReportType1", which can't be true because the lines are showing up in the reports like they're supposed to be.
SAMPLE DATA:
<header data>
.
43 lines (although this can vary)
.
<footer>
<ReportType1>
.
x number of lines (varies)
.
<footer>
<ReportType2>
.
x number of lines (varies)
.
<footer>
And so on and so forth, until the end of the file. The different types of reports are all mixed together.
All we can figure is we're missing something, probably pretty obvious, that will get this to output the data correctly.
Any help is appreciated.
Upvotes: 0
Views: 114
Reputation: 25041
The following should do what you want. Just replace the values for $oldword
and $newword
with your word replacements (these are case-insensitive for now) and the value of $report
with the report header you want to update.
$oldword = "Liability"
$newword = "Asset"
$report = "ReportType1"
$data = Get-Content Input.txt
$reports = $data | Select-String -Pattern $Report -AllMatches
$footers = $data | Select-String -Pattern "FOOTER" -AllMatches
$startindex = 0
[collections.arraylist]$output = foreach ($line in $reports) {
$section = ($line.linenumber-1),($footers.linenumber.where({$_ -gt $line.linenumber},'First')[0]-1)
if ($startindex -lt $section[0]-1) {
$data[$startindex..($section[0]-1)]
}
if ($startindex -eq $section[0]-1) {
$data[$startindex]
}
$data[$section[0]..$section[1]] -replace $oldword,$newword
$startindex = $section[1]+1
}
if ($startindex -eq $data.count-1) {
[void]$output.Add($data[$startindex])
}
if ($startindex -lt $data.count-1) {
[void]$output.Add($data[$startindex..($data.count-1)])
}
$output | Set-Content Output.txt
Code Explanation:
The intention of $oldword
is to be used in a regex replace operation. So any special regex characters will need to be escaped. I have opted to do that for you here. If you want to update the string that is to be replaced, you only need to update the characters between the quotes. This is case-insensitive when we pass it to the -replace
operator.
$newword
is simply the string that will replace the output of $oldword
. It does not require any special handling unless the string contains special PowerShell characters. The replacement text will appear as is including the case.
$report
is the name of the header of the section where you want to replace data. This is case-insensitive when we pass it to Select-String -Pattern
.
$data
is just the contents of the file as an array. Each line of the file is an indexed item in the array.
The first Select-String
does regex matching with the regex pattern being -Pattern $Report
. The reason it uses regex is because we did not specify the -SimpleMatch
parameter. -AllMatches
is added to capture every instance of $Report
within the file. The output is stored in $Reports
. $Reports is an array of MatchInfo
objects, which have properties that we will use like Line
and LineNumber
.
The second Select-String
does regex matching with the regex pattern being -Pattern "FOOTER"
. You could make this a variable instead if it could possibly change. The reason it uses regex is because we did not specify the -SimpleMatch
parameter. -AllMatches
is added to capture every instance of FOOTER
within the file.
$startIndex
is used to keep track of where we are in the array. It plays a role in helping us grab the different sections of the selected text.
$output
is an arraylist that contains the lines we are reading from $data
and the selected text that matches your report header (the Select-String -Pattern $Report
output). It is an arraylist so that we have access to the Add()
method for more efficiently constructing a collection. It is much more efficient than using +=
and custom object arrays.
The heart of the code starts with a foreach
loop that loops through each object in $Reports
. Each current object is stored in $line
. $Line
will become a MatchInfo
object as a result. $section
is an array of line numbers (offset by -1 because indexes start at 0) that contain the next $report
match through the next available FOOTER
match. The if
statements within the loop are just dealing with certain conditions like if the $report
matches the first or second line of the file or the first or second line of the next section. The foreach
loop will ultimately output all text leading up to the first $report
match, the text within each $report
match including the FOOTER
match, and the text between all matches.
The if
statements after the foreach
loop add the rest of the file beyond the last match to $output
.
Issues With Initial Attempt:
In your attempt, the thing creating a problem for you is the order of the reports in the file. If ReportType1 shows up after ReportType2 in the file, then the first If
statement will always be true. You are not examining a block of lines. Instead, you are examine all remaining lines starting from a certain line. I'll try to illustrate what I'm saying with an example:
Below is a sample file with line numbers
1. <footer>
2. <ReportType2>
3. data
4. data
5. <footer>
6. <ReportType1>
7. data
8. <footer>
Your startline will be 1 after reaching the first footer. You then read all lines skipping 1, which includes line 2 and line 6. ($line | select-object -skip 1) -match "ReportType1"
will find a match and return $true
in an if
statement. On the next for loop, you will iterate until startline becomes 5. Then ($line | select-object -skip 5) -match "ReportType1"
will also find a match. The only way your logic will work is if the ReportType1 section comes before ReportType2 in the file.
Upvotes: 1