Gordon
Gordon

Reputation: 6873

Select-Object -Unique

My understanding is that

$list = @('1', '2', '3', '1', '4')
Select-Object -InputObject $list -Unique

should return an array of just 4 elements, skipping the second '1' that is not unique.

But I am getting all 5 elements back. Am I understanding this wrong, or does Select-Object have a bug, at least in the PS 5.1 that I am testing on?

Upvotes: 2

Views: 663

Answers (1)

mklement0
mklement0

Reputation: 439183

  • With only a few exceptions, notably ConvertFrom-Csv, Get-Random, Join-String, and Get-Member, the -InputObject parameter should be thought of as an implementation detail whose purpose is to facilitate pipeline input, and which therefore shouldn't be used directly. The built-in cmdlets fall into one of the following categories:

    • Category A: A select few cmdlets such as Get-Member usefully distinguish between passing an input collection as an argument to -InputObject and implicitly enumerating the collection's elements via the pipeline.

    • Category B: A select few cmdlets, such as ConvertFrom-Csv (but not ConvertTo-Csv), Get-Random, Join-String and Out-String, either have array-valued -InputObject parameters (e.g. -InputObject <psobject[]>) or explicitly perform enumeration on the argument passed to their scalar InputObject parameters (e.g., -InputObject <psobject>).

      • For flat input collections (which are typical)[1], such cmdlets effectively treat direct argument-passing the same as pipeline input - except that direct argument-passing is much faster; see below.
    • Category C: Unfortunately, the majority of cmdlets have scalar -InputObject parameters and process collections passed to -InputObject as a whole, which effectively makes the parameter useless for direct argument-passing.

  • This is somewhat unfortunate, because passing an already in-memory collection as an argument to a cmdlet is much faster than sending its elements one by one through the pipeline.

    • For instance, compare the runtime of passing 1 million items to Get-Random using either direct argument-passing or the pipeline: Get-Random -InputObject (1..1e6) vs.
      1..1e6 | Get-Random

    • Note that this optimization is sometimes also available for other cmdlet parameters; notably, you can pass a collection to Set-Content's -Value parameter as an alternative to piping it, which greatly speeds up writing.


Here's a categorized and alphabetically sorted list of built-in cmdlets:

  • Category A: USEFUL DISTINCTION between pipeline input and explicit -InputObject use: to process collections as a whole, pass them to -InputObject; to process their elements one by one, use the pipeline:

    • Add-Member
    • Export-Clixml
    • Get-Member
    • Trace-Command
  • Category B: USEFUL EQUIVALENCE for flat collections: you can pass flat collections directly to -InputObject to speed up processing:

    • ConvertFrom-Csv
    • Format-Custom
    • Format-List
    • Format-Table
    • Format-Wide
    • Get-Random
    • Join-String
    • Out-Host
    • Out-String
  • Category C: USELESS DISTINCTION: Direct -InputObject use is pointless:

    • ConvertTo-Csv
    • ConvertTo-Html
    • ConvertTo-Xml
    • Export-Csv
    • ForEach-Object
    • Format-Hex
    • Get-Unique
    • Group-Object
    • Invoke-Command
    • Measure-Command
    • Measure-Object
    • Select-Object
    • Select-String
    • Sort-Object
    • Start-Job
    • Update-List
    • Where-Object

[1] Processing differences of nested collections between pipeline input and -InputObject use:

Those cmdlets that enumerate their -InputObject arguments perform only one level of enumeration on the input collection, and leave nested collections as-is.

By contrast, pipeline use can result in two levels of iteration, as the following Join-String example shows:

PS> Join-String -InputObject ('foo', ('bar', 'baz'))
foobar baz

foo and the stringification of the nested array as a whole - bar baz - were joined.

PS> 'foo', ('bar', 'baz') | Join-String
foobarbaz

foo and the enumerated elements of the nested array were joined.

The reason is that two processing passes happen in this case, due to the pipeline's enumeration behavior: foo is passed first, followed by nested array 'bar', 'baz', and the single-level enumeration is performed on each, and the results across all input objects are joined.

Upvotes: 2

Related Questions