Reputation: 559
This is a duplicate of How to exit from ForEach-Object in PowerShell - I'm not sure how to mark that, but I'm asking the question again because there is a ton of helpful information in the answers to that question including ways to handle it. However, the accepted answer from 2012 doesn't seem correct and the nuggets of wisdom are buried and shouldn't be.
So my script is looping through all of the CSV files in a directory and then taking action on those that contain records with the wrong number of columns. When I first ran this, it took a long time and I realized that some of the CSVs actually have the correct number of columns for all rows and I didn't need to take action on them, so I decided to try and implement something to check for at least one row with the incorrect number of columns and then take some kind of action on the file. To do so, I needed to kick out of my ForEach-Object loop once I found a qualifying row.
Here is the original script:
$ParentPath = "C:\Users\<username>\Documents\Temporary\*"
$Files = Get-ChildItem -Path $ParentPath -Include *.csv
foreach ($File in $Files) {
$OldPath = $File | % { $_.FullName }
$OldContent = Get-Content $OldPath
$NewContent = $OldContent |
Where-Object {$_.split(",").Count -eq 11}
Set-Content -Path "$(Split-Path $OldPath)\$($File.BaseName).txt" -Value $NewContent
}
Implementing some type of 'pre-check' on the files proved difficult, even though I was able to use an answer from SO to properly write-out the 'bad lines' - I was unable to quit out of the ForEach-Object loop without processing all rows (thereby defeating the entire purpose of pre-checking the files for offending rows).
Code below works great at identifying the first bad row, and if you remove the ;break
then it'll write out every offending row:
Get-Content $OldPath | ?{$_} | %{if(!($_.split(",").Count -eq 11)){"Process stopped at line number $($_.ReadCount), incorrect column count of: $($_.split(",").Count).";break}}
So how do I combine the two scripts to pre-check files for an offending row, and then take action on the files that need it? See proposed answer below!
Upvotes: 1
Views: 3015
Reputation: 11
One can exit an foreach-object "loop" by the following construction:
$objects = "Brakes","Wheels","Windows"
$Break = $False
$objects | Where-Object { $Break -eq $False } | ForEach-Object {
$Break = $_ -eq "Wheels";
Write-Output "The car has $_.";
}
This is not invented by myself, but found on https://linuxhint.com/how-to-exit-from-foreach-object-in-powershell/
Upvotes: 0
Reputation: 437062
Here's a PowerShell-idiomatic reformulation of your solution (PSv4+) that should perform much better.
Note, however, that it assumes that each .csv
file fits into memory as a whole:
Get-ChildItem -LiteralPath $HOME\Documents\Temporary -Filter *.csv | ForEach-Object {
$okRows, $brokenRows = (Get-Content -ReadCount 0 -LiteralPath $_.FullName).
Where({ $_.split(",").Count -eq 11 }, 'Split')
if ($brokenRows) {
Set-Content -LiteralPath "$($_.DirectoryName)\$($_.BaseName).txt" -Value $okRows
}
}
To address the question implied by your question's title:
Unfortunately, as of PowerShell Core 7.2 there is still no direct way to prematurely stop a pipeline on demand:
Select-Object
-First
is capable of that, but it uses a non-public exception type, so the behavior is limited to exiting the pipeline after the first N input objects.
GitHub issue #3821 is a long-standing feature request to make this capability generally available.
While the break
statement is not directly suitable for exiting a pipeline on demand - it looks for an enclosing loop statement to break out of, on the entire call stack, and, if it finds none, terminates execution overall - you can make it work with a dummy loop.
See this answer for more information.
Quick example:
# Use of the dummy `do { ... } while ($false)` loop enables `break`
# to exit the pipeline.
do { 1..10 | ForEach-Object { $i=0 } { $_; if (++$i -eq 2) { break } } } while ($false)
The caveat with both Select-Object -First
and break
+ dummy loop is that they skip the end
block (for cmdlets written in PowerShell) / EndProcessing()
method (for binary cmdlets) of all upstream cmdlets, which may cause problems.
Upvotes: 2
Reputation: 559
Tl;dr - here's my updated script:
$ParentPath = "C:\Users\<username>\Documents\Temporary\*"
$Files = Get-ChildItem -Path $ParentPath -Include *.csv
foreach ($File in $Files){
$OldPath = $File | %{$_.FullName}
Get-Content $OldPath | ?{!($_.split(",").Count -eq 11)} | Select -First 1 | %{[bool]$NeedsFix = 1}
If($NeedsFix){Get-Content $OldPath | ?{($_.split(",").Count -eq 11)} | Set-Content -Path "$(Split-Path $OldPath)\$($File.BaseName).txt"}
$NeedsFix = 0
}
To get here, let's summarize a few answers from the related question:
break
, continue
, and return
in a ForEach-Object loop and the ForEach method. Also see MS-ScriptingGuy - but this doesn't solve my conundrum-First 6
? Sure beats creating a counter variableUpvotes: 1