Hinton
Hinton

Reputation: 91

Comparison of strings not returning correct information

We're working with a text file that contains many different types of reports. Some of those reports need to either have some words changed or just copy them over exactly as they are.

The file has to stay a single text file, so the idea is to move through the file, comparing the lines. If a line is found that is a "ReportType1", then we need to change some wording, so we go into an inner loop, extracting the data and changing words as we go. The loop ends when it reaches a footer in the report and should move on to the next report.

We've tried -match, -like, -contains, -eq, but it never works quite like it's supposed to. We either get data that's been changed/reformatted that shouldn't be or we're only getting the header data.

Add-Type -AssemblyName System.Collections
Add-Type -AssemblyName System.Text.RegularExpressions

[System.Collections.Generic.List[string]]$content = @()

$inputFile   = "drive\folder\inputfile.txt"
$outputFile  = "drive\folder\outputfile.txt"

#This will retrieve the total number of lines in the file
$FileContent = Get-Content $inputFile
$FileLineCount = $FileContent | Measure-Object -Line
$TotalLines = $FileContent.Count

$TotalLines++ #Need to increase by one; the last line is blank

$startLine   = 0
$lineCounter = 0

#Start reading the file; this is the Header section
#Number of lines may vary, but data is copied over word
#for word
foreach($line in Get-Content $inputfile)
{
    $startLine++
    If($line -match "FOOTER")
    {
        [void]$content.Add( $line )
        break
    }
    else
    {
        [void]$content.Add( $line )
    }
}
## ^^This section works perfectly

#Start reading the body of the file
Do {
    #Start reading from the current position
    #This should change with each report read
    $line = Get-Content $inputFile | select -Skip $startLine

    If($line -match "ReportType1") #If it's a ReportType1, some wording needs to be changed
    {
        #Start reading the file from the current position
        #Should loop through this record only
        foreach($line in Get-Content $inputFile | select -skip $startline) 
        {
            If($line -match "FOOTER") #End of the current record
            {
                [void]$content.Add( $line )
                break #break out of the loop and continue reading the file from the new current position
            }
            elseif ($line -match "OldWord") #Have to replace a word on some lines
            {
                $line = $line.Replace("OldWord","NewWord")
                [void]$content.Add( $line ) 
            }
            else
            { 
                [void]$content.Add( $line ) 
            }
            $startline++                
        }
    }
    else
    {
         If($line -match "ReportType2") #ReportType2 can just be copied over line for line
         {
             #Start reading the file from the current position
             #Should loop through this record only
             foreach($line in Get-Content $inputFile | select -skip $startline) 
             {
                If($line -match "FOOTER") #End of the current record
                {
                    [void]$content.Add( $line )
                    break #break out of the loop and continue reading the file from the new current position
                }
                else
                { 
                    [void]$content.Add( $line ) 
                }
                $startline++                
        }
    }
    $startline++
} until ($startline -eq $TotalLines)

[System.IO.File]::WriteAllLines( $outputFile, $content ) | Out-Null

It sort of works, but we're getting some unexpected behavior. The reports look fine and all, but it's changing words in "ReportType2", even though the code isn't set up to do that. It's like it's only going through the first IF statement. But how can it be if the lines don't match up?

We know the $startline variable is increasing through the iterations, so it's not like it's stuck on one line. However, doing 'Write-Host' shows $line is always "ReportType1", which can't be true because the lines are showing up in the reports like they're supposed to be.

SAMPLE DATA:

<header data>
.
43 lines (although this can vary)
.
<footer>
<ReportType1> 
. 
x number of lines (varies)
. 
<footer> 
<ReportType2> 
. 
x number of lines (varies)
. 
<footer>

And so on and so forth, until the end of the file. The different types of reports are all mixed together.

All we can figure is we're missing something, probably pretty obvious, that will get this to output the data correctly.

Any help is appreciated.

Upvotes: 0

Views: 114

Answers (1)

AdminOfThings
AdminOfThings

Reputation: 25041

The following should do what you want. Just replace the values for $oldword and $newword with your word replacements (these are case-insensitive for now) and the value of $report with the report header you want to update.

$oldword = "Liability"
$newword = "Asset"
$report = "ReportType1"
$data = Get-Content Input.txt
$reports = $data | Select-String -Pattern $Report -AllMatches
$footers = $data | Select-String -Pattern "FOOTER" -AllMatches
$startindex = 0
[collections.arraylist]$output = foreach ($line in $reports) {
    $section = ($line.linenumber-1),($footers.linenumber.where({$_ -gt $line.linenumber},'First')[0]-1)
    if ($startindex -lt $section[0]-1) {
        $data[$startindex..($section[0]-1)]
    }
    if ($startindex -eq $section[0]-1) {
        $data[$startindex]
    }
    $data[$section[0]..$section[1]] -replace $oldword,$newword
    $startindex = $section[1]+1
}
if ($startindex -eq $data.count-1) {
    [void]$output.Add($data[$startindex])
}
if ($startindex -lt $data.count-1) {
    [void]$output.Add($data[$startindex..($data.count-1)])
}
$output | Set-Content Output.txt

Code Explanation:

The intention of $oldword is to be used in a regex replace operation. So any special regex characters will need to be escaped. I have opted to do that for you here. If you want to update the string that is to be replaced, you only need to update the characters between the quotes. This is case-insensitive when we pass it to the -replace operator.

$newword is simply the string that will replace the output of $oldword. It does not require any special handling unless the string contains special PowerShell characters. The replacement text will appear as is including the case.

$report is the name of the header of the section where you want to replace data. This is case-insensitive when we pass it to Select-String -Pattern.

$data is just the contents of the file as an array. Each line of the file is an indexed item in the array.

The first Select-String does regex matching with the regex pattern being -Pattern $Report. The reason it uses regex is because we did not specify the -SimpleMatch parameter. -AllMatches is added to capture every instance of $Report within the file. The output is stored in $Reports. $Reports is an array of MatchInfo objects, which have properties that we will use like Line and LineNumber.

The second Select-String does regex matching with the regex pattern being -Pattern "FOOTER". You could make this a variable instead if it could possibly change. The reason it uses regex is because we did not specify the -SimpleMatch parameter. -AllMatches is added to capture every instance of FOOTER within the file.

$startIndex is used to keep track of where we are in the array. It plays a role in helping us grab the different sections of the selected text.

$output is an arraylist that contains the lines we are reading from $data and the selected text that matches your report header (the Select-String -Pattern $Report output). It is an arraylist so that we have access to the Add() method for more efficiently constructing a collection. It is much more efficient than using += and custom object arrays.

The heart of the code starts with a foreach loop that loops through each object in $Reports. Each current object is stored in $line. $Line will become a MatchInfo object as a result. $section is an array of line numbers (offset by -1 because indexes start at 0) that contain the next $report match through the next available FOOTER match. The if statements within the loop are just dealing with certain conditions like if the $report matches the first or second line of the file or the first or second line of the next section. The foreach loop will ultimately output all text leading up to the first $report match, the text within each $report match including the FOOTER match, and the text between all matches.

The if statements after the foreach loop add the rest of the file beyond the last match to $output.

Issues With Initial Attempt:

In your attempt, the thing creating a problem for you is the order of the reports in the file. If ReportType1 shows up after ReportType2 in the file, then the first If statement will always be true. You are not examining a block of lines. Instead, you are examine all remaining lines starting from a certain line. I'll try to illustrate what I'm saying with an example:

Below is a sample file with line numbers

1. <footer>
2. <ReportType2>
3. data
4. data
5. <footer>
6. <ReportType1>
7. data
8. <footer>

Your startline will be 1 after reaching the first footer. You then read all lines skipping 1, which includes line 2 and line 6. ($line | select-object -skip 1) -match "ReportType1" will find a match and return $true in an if statement. On the next for loop, you will iterate until startline becomes 5. Then ($line | select-object -skip 5) -match "ReportType1" will also find a match. The only way your logic will work is if the ReportType1 section comes before ReportType2 in the file.

Upvotes: 1

Related Questions