user2725402
user2725402

Reputation: 4629

Ignore first and last line in file

I'm trying to replace characters in certain columns of multiple text files using PowerShell. I have it working perfectly except that I need to ignore the first and the last row in each file and I can't get that to work.

This is what I have so far:

$Location = "C:\Users\gerhardl\Documents\Tenacity\TEMP\POWERSHELL TESTS"
$Data = "$Location\*.TXT"
$Output = "$Location\Fixed"

Get-Item $Data |
    ForEach-Object {
        $file = $_
        $_ | 
            Get-Content | 
            ForEach-Object {
                $Beginning = $_.Substring(0,105)
                $Account = $_.Substring(105,20) -replace "[ABCDEFGHIJKLMNOPQRSTUVWXYZ]", " "
                $End = $_.Substring(125)
                '{0}{1}{2}' -f $Beginning,$Account,$End
            } |
            Set-Content -Path (Join-Path $Output  $file.Name)

    }

I know there are similar threads, but it seems that my For Each loop doesn't play well with those suggestions.

Upvotes: 3

Views: 12774

Answers (4)

mklement0
mklement0

Reputation: 437968

Note: This post answers the generic question of how to exclude the first and last line of an input file / an input collection from processing.

Manu's helpful ... | Select-Object -Skip 1 | Select-Object -SkipLast 1 solution works great in PSv5+ (assuming the first and last line should be eliminated from the output).

However, their PSv4- solution doesn't work (as of this writing), because the array ([System.Object[]] instance) returned by Get-Content $file | Select-Object -Skip 1 doesn't have a .GetRange() method.
Here's a working solution that uses PowerShell's range operator (..):

# Read lines of the input file into an array.
$allLines = Get-Content $file
# Using the range operator (..), get all elements except the first and the last.
$allLines[1..([Math]::Max(1, $allLines.Count-2))]

Note:
* Trying [1..-1] is tempting, but does not work in PowerShell, because 1..-1 evaluates to subscripts 1, 0, -1.
* If you know that there are at least 3 input objects, you can omit the [Math]::Max() call.

The above solution, however, is not always an option, because it requires collecting all input objects in memory first, which negates the memory-throttling, one-by-one processing that a pipeline-based solution offers.
(Although the in-memory solution, if feasible, is faster.)

To address that in PSv4-, you can emulate Select-Object -SkipLast 1 in a pipeline-friendly manner as follows (Select-Object -Skip 1 - skipping from the start - is supported in PSv4-).

# 'one', 'two', 'three' is a sample array. Output is 'one', 'two'
'one', 'two', 'three' | ForEach-Object { $notFirst = $False } { 
  if ($notFirst) { $prevObj }; $prevObj = $_; $notFirst = $True
}

Output of each input object is delayed by one iteration, which effectively omits the last one.

Here's the generalization to -SkipLast <n>, implemented as advanced function Skip-Last, which uses a [System.Collections.Generic.Queue[]] instance to delay output by <n> objects:

# Works in PSv2+
# In PSv5+, use `Select-Object -SkipLast <int>` instead.
Function Skip-Last {
  <#
  .SYNOPSIS
    Skips the last N input objects provided.
    N defaults to  1.
  #>
  [CmdletBinding()]
  param(
    [ValidateRange(1, 2147483647)] [int] $Count = 1,
    [Parameter(Mandatory = $True, ValueFromPipeline = $True)]$InputObject
  )

  begin { 
    $mustEnumerate = -not $MyInvocation.ExpectingInput # collection supplied via argument
    $qeuedObjs = New-Object System.Collections.Generic.Queue[object] $Count
  }
  process {
    # Note: $InputObject is either a single pipeline input object or, if
    #       the -InputObject *parameter* was used, the entire input collection.
    #       In the pipeline case we treat each object individually; in the
    #       parameter case we must enumerate the collection.
    foreach ($o in ((, $InputObject), $InputObject)[$mustEnumerate]) {
      if ($qeuedObjs.Count -eq $Count) {
        # Queue is full, output its 1st element.
        # The queue in essence delays output by $Count elements, which 
        # means that the *last* $Count elements never get emitted.
        $qeuedObjs.Dequeue()  
      }
      $qeuedObjs.Enqueue($o)
    }
  }
}

Note: In the ValidateRange() attribute above, 2147483647 is used instead of [int]::MaxValue, because PSv2 only supports constants in this case.

Sample call:

PS> 'one', 'two', 'three', 'four', 'five' | Skip-Last 3
one
two

Upvotes: 2

Manu
Manu

Reputation: 1742

You can use -Skip 1 and -SkipLast 1 :

Get-Content $file  | Select-Object -Skip 1 | Select-Object -SkipLast 1

Edit for PS < 5 :

$text = Get-Content $file | Select-Object -Skip 1
$newText = $text.GetRange(0,($text.Count - 1))
$newText

Upvotes: 7

user2725402
user2725402

Reputation: 4629

I managed to do this as follows - not exactly what I posted but couldn't make that work. The first and last lines (header and trailer records) are much shorter in length so I did the following:

$Location = "C:\Users\gerhardl\Documents\Tenacity\TEMP\POWERSHELL TESTS"
$Data = "$Location\*.TXT"
$Output = "$Location\Fixed"

Get-Item $Data |
    ForEach-Object {
        $file = $_
        $_ | 
            Get-Content | 
            ForEach-Object {
            if ($_.length -gt 30)
            { 

                $Beginning = $_.Substring(0,105)
                $Account = $_.Substring(105,20) -replace "[ABCDEFGHIJKLMNOPQRSTUVWXYZ]", " "
                $End = $_.Substring(125)
                '{0}{1}{2}' -f $Beginning,$Account,$End
            }
            ELSE {
                $All = $_.Substring(0)
                '{0}' -f $All
                 }

            } |

            Set-Content -Path (Join-Path $Output  $file.Name)

    }

Upvotes: 0

TessellatingHeckler
TessellatingHeckler

Reputation: 28993

Tracking the first line is possible with a bool for each file $IsFirstLine = $True and then setting it to false inside the ForEach-Object. But tracking the last line, I think, is impossible with your pipeline method - you've already processed the last line before you know that it was the last one.

So you'd need another loop to count the lines or a buffer to be able to undo the changes on the last line once you identified it.

If the files are small enough to read into memory, maybe you could use an approach like:

$Location = "C:\Users\gerhardl\Documents\Tenacity\TEMP\POWERSHELL TESTS"
$Data = "$Location\*.TXT"
$Output = "$Location\Fixed"

Get-Item $Data | ForEach-Object {                   # for each file..

    $Lines = @(Get-Content $_.FullName)             # read all the lines, force array.
    $LinesToProcess = $Lines[1..($Lines.Count - 1)] # get lines except first and last.

    $ProcessedLines = $LinesToProcess | ForEach-Object {    # for each line..

        $Beginning = $_.Substring(0,105)
        $Account = $_.Substring(105,20) -replace "[ABCDEFGHIJKLMNOPQRSTUVWXYZ]", " "
        $End = $_.Substring(125)
        '{0}{1}{2}' -f $Beginning,$Account,$End

    }

    $OutputLines = $Lines[0] + $ProcessedLines + $Lines[-1] # add original first and last

    $OutputLines | Set-Content -Path (Join-Path $Output $_.Name)

}

Upvotes: 3

Related Questions