Plastikfan
Plastikfan

Reputation: 4022

How to 'stream' input from 1 command to another in the same PowerShell pipeline without caching items

There is more to this question than it initially appears, but there is only so much text you can put into a title.

I have an existing powershell command that processes items from the pipeline using the begin/process and end blocks; this all works as expected. This command is an 'internal' command to be invoked from another command as opposed to being an end user function invoked interactively.

I now wish to write a second command that makes use of the first command, that also accepts input from the pipeline. The second function needs to use the same single pipleline as the first. However, the first function IS designed to be used interactively and effectively is a wrapper around the second function, which the user should not be concerned with.

Idiomatically, you would do something like this:

1..4 | first-command | second-command

but as I said before, second-command is a complicated command that would be clunky to use interactively. So instead, I intend the user to do this instead:

1..4 | first-command

Where first-command handles the interaction with second-command as an internal implementation matter, all within a SINGLE pipeline. Also, you will note that I mentioned the word 'stream' in the title, which means that first-command should not be caching pipeline items; since the pipeline could be quite large.

I know what I'm actually asking may not be possible, but PowerShell is packed with surprises, which is why I ask the question.

I have written the following Pester test cases which illustrate what I am trying to achieve.

  Context 'given: pipeline variable defined as a scalar value' {
    It 'should: invoke pipeline in a single pass' {
      function invoke-first {
        param(
          [Parameter(ValueFromPipeline = $true)]
          [int]$pipelineItem
        )

        begin { Write-Host '>>> invoke-first [SCALAR] >>>'; }
        process { $pipelineItem | invoke-second }
        end { Write-Host '<<< invoke-first [SCALAR] <<<'; }
      }

      function invoke-second {
        param(
          [Parameter(ValueFromPipeline = $true)]
          [int]$pipelineItem
        )

        begin { Write-Host '>>> invoke-second [SCALAR] >>>'; }
        process { Write-Host "  [+] SECOND $($pipelineItem * 2)"; }
        end { Write-Host '<<< invoke-second [SCALAR] <<<'; }
      }
      # How to correctly 'stream' the items through in the same pipeline, instead of
      # 1 at a time, without caching the items?
      #
      1..4 | invoke-first
    }
  }

This displays the following:

>>> invoke-first [SCALAR] >>>
>>> invoke-second [SCALAR] >>>
  [+] SECOND 2
<<< invoke-second [SCALAR] <<<
>>> invoke-second [SCALAR] >>>
  [+] SECOND 4
<<< invoke-second [SCALAR] <<<
>>> invoke-second [SCALAR] >>>
  [+] SECOND 6
<<< invoke-second [SCALAR] <<<
>>> invoke-second [SCALAR] >>>
  [+] SECOND 8
<<< invoke-second [SCALAR] <<<
<<< invoke-first [SCALAR] <<<

The problem this shows is that this line of code (in invoke-first):

process { $pipelineItem | invoke-second }

is creating a new separate 1 item pipeline for each item it receives. This is denoted by the fact that we see '>>> invoke-first [SCALAR] >>>' and '<<< invoke-second [SCALAR] <<<' for every pipeline item. This is not what was intended.

I also tried to change the definition of the pipelineItem to be an array; "[int[]]$pipelineItem", but this does NOT make a material desired difference.

As previously stated, in order to achieve the outcome I require, we need to invoke invoke-second on the command line (but this is what I'm trying to avoid).

  Context 'given: pipeline variable defined as a piped scalar value' {
    It 'should: invoke pipeline in a single pass' {
      function invoke-first {
        param(
          [Parameter(ValueFromPipeline = $true)]
          [int]$pipelineItem
        )

        begin { Write-Host '>>> invoke-first [PIPED-SCALAR] >>>'; }
        process { $pipelineItem }
        end { Write-Host '<<< invoke-first [PIPED-SCALAR] <<<'; }
      }

      function invoke-second {
        param(
          [Parameter(ValueFromPipeline = $true)]
          [int]$pipelineItem
        )

        begin { Write-Host '>>> invoke-second [PIPED-SCALAR] >>>'; }
        process { Write-Host "  [+] SECOND $($pipelineItem * 2)"; }
        end { Write-Host '<<< invoke-second [PIPED-SCALAR] <<<'; }
      }
      # Don't like this because invoke-second is a complicated internal function
      # that the user should not need to know about and would be cumbersome in
      # an interactive session.
      #
      1..4 | invoke-first | invoke-second
    }
  }

produces this as output:

>>> invoke-first [PIPED-SCALAR] >>>
>>> invoke-second [PIPED-SCALAR] >>>
  [+] SECOND 2
  [+] SECOND 4
  [+] SECOND 6
  [+] SECOND 8
<<< invoke-first [PIPED-SCALAR] <<<
<<< invoke-second [PIPED-SCALAR] <<<

... and '>>> invoke-second [PIPED-SCALAR] >>>' and '<<< invoke-first [PIPED-SCALAR] <<<' are both displayed just the once, indicating there is only 1 pipeline.

If we cache the pipeline items in the first command:

  Context 'given: cached pipeline variable defined as a scalar value' {
    It 'should: invoke pipeline in a single pass' -Tag 'Current' {
      function invoke-first {
        param(
          [Parameter(ValueFromPipeline = $true)]
          [int]$pipelineItem
        )
        # This method cheats, because it caches the items into a collection
        #
        begin { $coll = @(); Write-Host '>>> invoke-first [CACHED-SCALAR] >>>'; }
        process { $coll += $pipelineItem }
        end { $coll | invoke-second; Write-Host '<<< invoke-first [CACHED-SCALAR] <<<'; }
      }

      function invoke-second {
        param(
          [Parameter(ValueFromPipeline = $true)]
          [int]$pipelineItem
        )

        begin { Write-Host '>>> invoke-second [CACHED-SCALAR] >>>'; }
        process { Write-Host "  [+] SECOND $($pipelineItem * 2)"; }
        end { Write-Host '<<< invoke-second [CACHED-SCALAR] <<<'; }
      }
      1..4 | invoke-first;
    }
  }

this produces this output:

>>> invoke-first [CACHED-SCALAR] >>>
>>> invoke-second [CACHED-SCALAR] >>>
  [+] SECOND 2
  [+] SECOND 4
  [+] SECOND 6
  [+] SECOND 8
<<< invoke-second [CACHED-SCALAR] <<<
<<< invoke-first [CACHED-SCALAR] <<<

So the root of my problem is the interaction of the pipeline from the first command:

process { $pipelineItem | invoke-second }

How can we pipe the items to invoke-second, without caching and without forcing the user to invoke invoke-second? Again, I realise I may be barking up the wrong tree here, but I'm hoping there is a different technique that I can use that I'm not aware of.

EDIT: Integrating the pipelines of invoke-first and invoke-second

Currently, both commands define a paramter that accepts input from the pipeline eg:

[Parameter(ParameterSetName = 'InvokeScriptBlock', Mandatory, ValueFromPipeline = $true)]
[Parameter(ParameterSetName = 'InvokeFunction', Mandatory, ValueFromPipeline = $true)]
[System.IO.FileSystemInfo]$pipelineItem,

At the moment invoke-first has it own begin/process/end blocks, but it caches items into a temporary collection in the process block and then in the end block, pipes them into invoke-second all at once. This is what I want to get rid of:

process-block(invoke-first):

$collection += $pipelineItem;

end-block(invoke-first):

$collection | Invoke-Second @parameters

Somehow, I need to feed the pipeline of invoke-second from invoke-first, via the proxy.

With the proxy command, we now have 3 sets of begin/process/end blocks which need to be integrated into a single pipeline without making any changes to the code in invoke-second, because it it used in other contexts.

Upvotes: 1

Views: 995

Answers (1)

Mathias R. Jessen
Mathias R. Jessen

Reputation: 174825

This can be solved with a proxy command - a proxy command is, as the name implies, a way to "wrap" a command in a new one, while retaining the pipeline behavior you'd expect when invoking the target command directly.

Implementing this pattern can be a bit convoluted, so instead of writing it from scratch I'm gonna show you how to generate the source code for a proxy function using the built-in [ProxyCommand] helper class!

In the following, I'll be using the terms "proxy"/"proxy command"/"proxy function" interchangeably to refer to the function that'll replace Invoke-First from your example, and "target command"/"target function" to refer to the internal function we're trying to wrap, Invoke-Second in this case.

Generating a proxy command

These are the steps required to generate a proxy function in PowerShell:

using namespace System.Management.Automation

# Start by making sure the target function is discoverable (note on module-scoped proxies below)
function Invoke-Second {
    param(
      [Parameter(ValueFromPipeline = $true)]
      [int]$pipelineItem,

      [Parameter()]
      [ValidateRange(1,100)]
      [int]$Factor = 2
    )

    begin { Write-Host '>>> invoke-second [PIPED-SCALAR] >>>'; }
    process { Write-Host "  [+] SECOND $($pipelineItem * $factor)"; }
    end { Write-Host '<<< invoke-second [PIPED-SCALAR] <<<'; }
}

# Discover corresponding CommandInfo object using Get-Command
$InvokeSecondCmdInfo = Get-Command Invoke-Second

# Extract metadata for the CommandInfo object
$InvokeSecondCmdMeta = [CommandMetaData]::new($InvokeSecondCmdInfo)

# Create plain proxy command stub
$InvokeSecondProxyCmd = [ProxyCommand]::Create($InvokeSecondCmdMeta)

# (optional) Write proxy function definition to file
$InvokeSecondProxyCmd |Set-Content ".\Invoke-ProxySecond.ps1"

(The parameterization of $Factor is intentional, it'll make sense in a second, I promise)

The resulting function will act exactly as if the caller had invoked Invoke-Second directly in a nested pipeline:

PS C:\Users\Plastikfan> 1..4 |.\Invoke-SecondProxy.ps1
>>> invoke-second [SCALAR] >>>
  [+] SECOND 2
  [+] SECOND 4
  [+] SECOND 6
  [+] SECOND 8
<<< invoke-second [SCALAR] <<<

That would be it for the sample function you posted in the question - we're basically done - but I'm gonna assume that in your real-life use case you will not have a 1-to-1 relationship between the parameters you want to expose publicly and the parameters accepted by the internal target command - so we'll tackle that below as well.

Module-scoped proxies

Executing the code above will result in a proxy function that will work as expected for a simple script module that defines the same Invoke-Second function.

That being said, I'd strongly recommend executing the code to generate the proxy function inside the target module scope. This is simply to ensure exact and appropriate discoverability of the target function at runtime, since the proxy function itself will also be executing in module scope.

To execute a script or script block in a specific module scope, pass the module as the first argument to the & call operator:

$targetModule = Get-Module MyModule
& $targetModule .\GenerateProxy.ps1

... where GenerateProxy.ps1 is the same as the example above, but without the inline function Invoke-Second ... definition - we want Get-Command to discover the "real one" from the $targetModule.


Modifying user-facing parameter sets

Let's first take a look at the param and begin blocks of the generated proxy function definition - this is where we have a chance to modify explicitly bound parameter arguments before invoking the target function.

[CmdletBinding()]
param(
    [Parameter(Position=0, ValueFromPipeline=$true)]
    [int]
    ${pipelineItem},

    [Parameter(Position=1)]
    [ValidateRange(1, 100)]
    [int]
    ${Factor})
begin
{
    try {
        $outBuffer = $null
        if ($PSBoundParameters.TryGetValue('OutBuffer', [ref]$outBuffer))
        {
            $PSBoundParameters['OutBuffer'] = 1
        }
        $wrappedCmd = $ExecutionContext.InvokeCommand.GetCommand('invoke-second', [System.Management.Automation.CommandTypes]::Function)
        $scriptCmd = {& $wrappedCmd @PSBoundParameters }
        $steppablePipeline = $scriptCmd.GetSteppablePipeline()
        $steppablePipeline.Begin($PSCmdlet)
    } catch {
        throw
    }
}

As you can see, the param block is effectively just a 1-to-1 copy of the target command's param block!

We can modify the declared parameters as we see fit, we just need to make sure we don't pass anything off to the target command that it doesn't accept.

Moving on to the begin block, the first 4 lines inside the try block ($outBuffer = ...) is meant to prevent the runtime processor from "double buffering" the output from the target command in the wrapper, you can ignore those and just leave them as-is. I personally suggest adding our custom parameter modification code immediately after this block, but anywhere in the begin block prior to the definition of $scriptCmd will work just fine.

Pruning proxy parameters

If you want to prevent the caller from controlling specific parameter arguments passed the target command that, simply remove them from the proxy function param block:

param(
    [Parameter(Position=0, ValueFromPipeline=$true)]
    [int]
    ${pipelineItem})

If you want to pass a specific argument value for the parameter in question, add it to the $PSBoundParameters dictionary inside the begin block:

$PSBoundParameters['Factor'] = 10

If, you don't want to pass the parameter to the target command at all, no modification is needed beyond removal from param.

Adding extra parameters to the proxy

You might also want to accept parameter arguments that should not be passed to the target command. In this case, we first need to add the new parameter definition to the param block in the proxy:

param(
    [Parameter(Position=0, ValueFromPipeline=$true)]
    [int]
    ${pipelineItem},

    [Parameter(Position=1)]
    [ValidateRange(1, 100)]
    [int]
    ${Factor},

    [Parameter(Position=2)]
    [int]
    ${NotRelevantToInvokeSecond})

The caller can now pass -NotRelevantToInvokeSecond 123 to the proxy, so we need to remove it from $PSBoundParameters in begin - otherwise the target command will throw a parameter binding exception:

if($PSBoundParameters.Contains('NotRelevantToInvokeSecond')){
  $PSBoundParameters.Remove('NotRelevantToInvokeSecond')
}

If you have multiple extra proxy parameters like this, you might want to use a simple loop to take care of removal:

'ExtraParam1','ExtraParam2','AnotherParam' |Foreach-Object {
  if($PSBoundParameters.Contains($_)){
    $PSBoundParameters.Remove($_)
  }
}

Next steps

This should get you started, but you can do whatever you want with the proxy function as long as you ensure the splatting arg passed to $wrappedCmd in the $scriptCmd block is valid for the target command - your imagination sets the limit here :)

Upvotes: 3

Related Questions