Reputation: 6873
My understanding is that
$list = @('1', '2', '3', '1', '4')
Select-Object -InputObject $list -Unique
should return an array of just 4 elements, skipping the second '1' that is not unique.
But I am getting all 5 elements back. Am I understanding this wrong, or does Select-Object have a bug, at least in the PS 5.1 that I am testing on?
Upvotes: 2
Views: 663
Reputation: 439183
With only a few exceptions, notably ConvertFrom-Csv
, Get-Random
, Join-String
, and Get-Member
, the -InputObject
parameter should be thought of as an implementation detail whose purpose is to facilitate pipeline input, and which therefore shouldn't be used directly. The built-in cmdlets fall into one of the following categories:
Category A: A select few cmdlets such as Get-Member
usefully distinguish between passing an input collection as an argument to -InputObject
and implicitly enumerating the collection's elements via the pipeline.
Category B: A select few cmdlets, such as ConvertFrom-Csv
(but not ConvertTo-Csv
), Get-Random
, Join-String
and Out-String
, either have array-valued -InputObject
parameters (e.g. -InputObject <psobject[]>
) or explicitly perform enumeration on the argument passed to their scalar InputObject
parameters (e.g., -InputObject <psobject>
).
Category C: Unfortunately, the majority of cmdlets have scalar -InputObject
parameters and process collections passed to -InputObject
as a whole, which effectively makes the parameter useless for direct argument-passing.
This is somewhat unfortunate, because passing an already in-memory collection as an argument to a cmdlet is much faster than sending its elements one by one through the pipeline.
For instance, compare the runtime of passing 1 million items to Get-Random
using either direct argument-passing or the pipeline: Get-Random -InputObject (1..1e6)
vs.
1..1e6 | Get-Random
Note that this optimization is sometimes also available for other cmdlet parameters; notably, you can pass a collection to Set-Content
's -Value
parameter as an alternative to piping it, which greatly speeds up writing.
Here's a categorized and alphabetically sorted list of built-in cmdlets:
Category A: USEFUL DISTINCTION between pipeline input and explicit -InputObject
use: to process collections as a whole, pass them to -InputObject
; to process their elements one by one, use the pipeline:
Add-Member
Export-Clixml
Get-Member
Trace-Command
Category B: USEFUL EQUIVALENCE for flat collections: you can pass flat collections directly to -InputObject
to speed up processing:
ConvertFrom-Csv
Format-Custom
Format-List
Format-Table
Format-Wide
Get-Random
Join-String
Out-Host
Out-String
Category C: USELESS DISTINCTION: Direct -InputObject
use is pointless:
ConvertTo-Csv
ConvertTo-Html
ConvertTo-Xml
Export-Csv
ForEach-Object
Format-Hex
Get-Unique
Group-Object
Invoke-Command
Measure-Command
Measure-Object
Select-Object
Select-String
Sort-Object
Start-Job
Update-List
Where-Object
[1] Processing differences of nested collections between pipeline input and -InputObject
use:
Those cmdlets that enumerate their -InputObject
arguments perform only one level of enumeration on the input collection, and leave nested collections as-is.
By contrast, pipeline use can result in two levels of iteration, as the following Join-String
example shows:
PS> Join-String -InputObject ('foo', ('bar', 'baz'))
foobar baz
foo
and the stringification of the nested array as a whole - bar baz
- were joined.
PS> 'foo', ('bar', 'baz') | Join-String
foobarbaz
foo
and the enumerated elements of the nested array were joined.
The reason is that two processing passes happen in this case, due to the pipeline's enumeration behavior: foo
is passed first, followed by nested array 'bar', 'baz'
, and the single-level enumeration is performed on each, and the results across all input objects are joined.
Upvotes: 2