Dávid Laczkó
Dávid Laczkó

Reputation: 1101

PowerShell ForEach-Object elimination

Let's consider a collection of collections, and an operation that needs to be performed inside a pipeline on each element of the inner collection.

For the sake of simplicity, let it be an array of arrays, and the operation is simply printing to screen. For my question to be represented, let us also have an array whose elements are not collections:

$Array = "A", "B", "C"
$ArrayOfArrays = (1, 2, 3), (4, 5, 6), (7, 8, 9)

We know that piping will break a collection down to elements, like this:

$Array | & {process {Write-Host $_}}
$ArrayOfArrays | & {process {Write-Host $_}}

Now, to my surprise, when I run this, it is not breaking down the inner array to its elements:

$ArrayOfArrays | % -process {Write-Host $_} (1)

neither this:

$ArrayOfArrays | % -process {% -process {Write-Host $_}} (2)

(however this latter might seem an unnecessary attempt, seeing that (1) does not do that, but I tried it...)
I expected try (1) to do that, because I thought that piping does one breakdown, and when an element is received by ForEach-Object, it will further break it down, if it is a collection.

I could only solve it with inner piping:

$ArrayOfArrays | % -process {$_ | % -process {Write-Host $_}} (3)

however with this approach I can eliminate the ForEach-Object, of course:

$ArrayOfArrays | & {process {$_ | & {process {Write-Host $_}}}} (4)

So my 2 questions are:

1,

How to access an element of a collection that is in the collection in a pipeline, other than tries (3) and (4), or is this the only way to do that?

2,

If the only way to do what question 1 is asking is tries (3) and (4), then what is a valid use case of ForEach-Object, where it can not be eliminated? I mean it can be a logical case, but also performance vs a script block. The fact that it is nicer than a script block with one pair of braces less is just not really enough for me...

.
EDIT after Manuel Batsching's answer:

As the ForEach-Object returns a collection's elements after its processing, we can do this (I let go of Write-Host, maybe it wasn't a good arbitrary operation, so let it be GetType):

$ArrayOfArrays | % -process {$_} | & {process {$_.GetType()}}

But we also know that if something returns a new object in the pipeline, it will trigger a breakdown if it is further piped and if it is a collection. So to do the breakdown, we can again eliminate ForEach-Object and do this:

$ArrayOfArrays | & {process {$_}} | & {process {$_.GetType()}}

And this dummy operation can be syntactically reduced if I define a filter like this:

Filter §
{
    param (
            [Parameter (Mandatory = $True, ValueFromPipeline = $True)]
            [Object]
            $ToBeTriggeredForBreakDown
    ) # end param

    $ToBeTriggeredForBreakDown

}

and use it like this:

$Array | § | & {process {$_.GetType()}}
$ArrayOfArrays | § | & {process {$_.GetType()}}

$ArrayOfArraysOfArrays = ((1, 2), (3, 4)), ((5, 6), (7, 8))
$ArrayOfArraysOfArrays | § | & {process {$_.GetType()}}
$ArrayOfArraysOfArrays | § | § | & {process {$_.GetType()}}

So it is still hard to see for me when I would use ForEach-Object, it seems to me it is completely useless - except for reasons I look for in my questions.

.
EDIT after research:

Some collections provide their own methods, e.g. since v4 arrays have a ForEach method, so besides (3) and (4), one can do this (again a dummy operation, but with less code):

$ArrayOfArrays.ForEach{$_} | & {process {$_.GetType()}}

so this partially covers question 1.

Upvotes: 1

Views: 1202

Answers (2)

AdminOfThings
AdminOfThings

Reputation: 25031

In PowerShell 7, Foreach-Object has the -Parallel switch for parallel execution. This is not necessarily fast for all types of processing. You will have to experiment with this.

Foreach-Object's -Process parameter takes an array of script blocks. So you could technically perform different processing scripts against each piped object.

1,2,3 | Foreach-Object -begin {"First loop iteration"} -process {$_ + 1},{$_ + 2},{$_ + 3} -End {"Last loop iteration"}
First loop iteration
2
3
4
3
4
5
4
5
6
Last loop iteration

# Example of already having script blocks defined
$sb1,$sb2,$sb3 = { $_ + 1 },{$_ + 2},{$_ + 3}
1,2,3 | Foreach-Object -begin {"Starting the loop"} -process $sb1,$sb2,$sb3 -end {"the loop finished"}
Starting the loop
2
3
4
3
4
5
4
5
6
the loop finished

Foreach-Object also supports operation statements. Technically you don't have to do anything, but 1,2,3 | Foreach ToString is arguably more readable than 1,2,3 | & { process { $_.ToString() }}.

Foreach-Object also has the -InputObject parameter where you can process the entire object as one item. That is its way of preventing the array unwrapping that you see in the pipeline. You can do that with your method, but you must do obscure array wrapping yourself like ,@(1,2,3) before sending down the pipeline.

# Single pipeline object
$count = 1
ForEach-Object -InputObject 1,2,3 -Process {"Iteration Number: $count"; $_; $count++}
Iteration Number: 1
1
2
3

# array unwrapping down pipeline

$count = 1
1,2,3 | ForEach-Object -Process {"Iteration Number: $count"; $_; $count++}
Iteration Number: 1
1
Iteration Number: 2
2
Iteration Number: 3
3

Since Foreach-Object is a cmdlet, you gain access to Common Parameters. So you can utilize -PipelineVariable for example to use output from this command to a command in a deeper pipeline.

# Using OutVariable
1,2,3 | Foreach-Object {$_ + 100} -OutVariable numbers |
    Foreach-Object -process { "Current Number: $_"; "Numbers Processed So Far: $numbers" }
Current Number: 101
Numbers Processed So Far: 101
Current Number: 102
Numbers Processed So Far: 101 102
Current Number: 103
Numbers Processed So Far: 101 102 103

# Using PipeLineVariable
1,2,3 | Foreach-Object {$_ + 100} -PipeLineVariable first |
    Foreach-Object {$_ * 2} -PipelineVariablesecond |
        Foreach-Object {"First number is $first"; "second number is $second"; "final calculation is $($_*3)" }
First number is 101
second number is 202
final calculation is 606
First number is 102
second number is 204
final calculation is 612
First number is 103
second number is 206
final calculation is 618

My test cases show that the data | & { process {}} method is faster than data | foreach-object -process {}. So it appears to be a tradeoff as to what you want to get out of it.

Measure-Command {1..100000 | & { process {$_}}}


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 107
Ticks             : 1074665
TotalDays         : 1.24382523148148E-06
TotalHours        : 2.98518055555556E-05
TotalMinutes      : 0.00179110833333333
TotalSeconds      : 0.1074665
TotalMilliseconds : 107.4665


Measure-Command {1..100000 | Foreach-Object {$_}}


Days              : 0
Hours             : 0
Minutes           : 0
Seconds           : 0
Milliseconds      : 768
Ticks             : 7686545
TotalDays         : 8.89646412037037E-06
TotalHours        : 0.000213515138888889
TotalMinutes      : 0.0128109083333333
TotalSeconds      : 0.7686545
TotalMilliseconds : 768.6545

When running Foreach-Object, all code is run in the current caller's scope including the contents of the script block. & runs code in a child scope and anything changed in that scope may not be reflected when returning to the parent scope (calling scope). You will need to use . to call in the current scope.

# Notice $a outputs nothing outside of the loop
PS > 1,2,3 | & { begin {$a = 100} process { $_ } end {$a}}
1
2
3
100
PS > $a

PS >

# Notice with . $a is updated
PS > 1,2,3 | . { begin {$a = 100} process { $_ } end {$a}}
1
2
3
100
PS > $a
100
PS >

# foreach updates current scope (used a different variable, because  
# $a was already added by the previous command)
PS > 1,2,3 | foreach-object -begin {$b = 333} -process {$_} -end {$b}
1
2
3
333
PS > $b
333
PS >

Upvotes: 0

Manuel Batsching
Manuel Batsching

Reputation: 3616

In my understanding, the unwrapping of arrays is done, once they are passed down the pipeline or to the output stream.

You will see the this behaviour with all of the following approaches:

$ArrayOfArrays | % -process { $_ }
$ArrayOfArrays | & { process { $_ } }
foreach ($arr in $ArrayOfArrays) { $arr }

Now what ruins the unwrapping in your example is the Write-Host cmdlet. As this cmdlet is writing not to the output stream but to your console, it casts the input object to [string]. That is why you see a string represenation of the inner arrays on your console.

Replace Write-Host with Write-Output and the inner arrays will be properly unwrapped:

 PS> $ArrayOfArrays | % -process { Write-Output $_ }
1
2
3
4
5
6
7
8
9

EDIT:

You can use a debugger to determine exactly, where the unwrapping is done. Use for example the following code in VSCode:

$ArrayOfArrays = (1, 2, 3), (4, 5, 6), (7, 8, 9)
$foo = $null
$foo = $ArrayOfArrays | % { Write-Output $_ }

Set a breakpoint to the line $foo = $null, add the variables $foo and $_ to the watchlist, hit F5 to start the debugger and watch the variables change, while you hit F11 to step into the individual processing steps.

  • $_ will show the inner array which is the current element in the pipeline.
  • $foo will receive only the unwrapped elements after the pipeline execution ends

Upvotes: 1

Related Questions