Reputation: 2061
i wanted a small logic to compare contents of two arrays & get the value which is not common amongst them using powershell
example if
$a1=@(1,2,3,4,5)
$b1=@(1,2,3,4,5,6)
$c which is the output should give me the value "6
" which is the output of what's the uncommon value between both the arrays.
Can some one help me out with the same! thanks!
Upvotes: 91
Views: 291946
Reputation: 487
Here is a little bit different idea that uses Compare-Object
but gives output similar to @iRon's answer, where the output contains what is in $a but not in $b and vice versa:
PS > Compare-Object -ReferenceObject (0..5) -DifferenceObject (1..6) -PassThru | ? {$_.SideIndicator -eq '=>'}
6
PS > Compare-Object -ReferenceObject (0..5) -DifferenceObject (1..6) -PassThru | ? {$_.SideIndicator -eq '<='}
0
PS: inspired by @mklement0's comment.
Upvotes: 0
Reputation: 23673
$a = 1..5
$b = 4..8
$Yellow = $a | Where {$b -NotContains $_}
$Yellow
contains all the items in $a
except the ones that are in $b
:
PS C:\> $Yellow
1
2
3
$Blue = $b | Where {$a -NotContains $_}
$Blue
contains all the items in $b
except the ones that are in $a
:
PS C:\> $Blue
6
7
8
$Green = $a | Where {$b -Contains $_}
Not in question, but anyways; Green
contains the items that are in both $a
and $b
.
PS C:\> $Green
4
5
Notes:
Where
is an alias of Where-Object
.-NotContains
operator you might also use:
-NotIn
operator (and swap the operands):$Yellow = $a | Where {$_ -NotIn $b}
$Yellow = $a | Where {-not ($b -eq $_)}
Addendum 12 October 2019
As commented by @xtreampb and @mklement0: although not shown from the example in the question, the task that the question implies (values "not in common") is the symmetric difference between the two input sets (the union of yellow and blue).
The symmetric difference between the $a
and $b
can be literally defined as the union of $Yellow
and $Blue
:
$NotGreen = $Yellow + $Blue
Which is written out:
$NotGreen = ($a | Where {$b -NotContains $_}) + ($b | Where {$a -NotContains $_})
As you might notice, there are quite some (redundant) loops in this syntax: all items in list $a
iterate (using Where
) through items in list $b
(using -NotContains
) and visa versa. Unfortunately the redundancy is difficult to avoid as it is difficult to predict the result of each side. A Hash Table is usually a good solution to improve the performance of redundant loops. For this, I like to redefine the question: Get the values that appear once in the sum of the collections ($a + $b
):
$Count = @{}
$a + $b | ForEach-Object {$Count[$_] += 1}
$Count.Keys | Where-Object {$Count[$_] -eq 1}
By using the ForEach
statement instead of the ForEach-Object
cmdlet and the Where
method instead of the Where-Object
you might increase the performance by a factor 2.5:
$Count = @{}
ForEach ($Item in $a + $b) {$Count[$Item] += 1}
$Count.Keys.Where({$Count[$_] -eq 1})
But Language Integrated Query (LINQ) will easily beat any native PowerShell and native .Net methods (see also High Performance PowerShell with LINQ and mklement0's answer for Can the following Nested foreach loop be simplified in PowerShell?:
To use LINQ you need to explicitly define the array types:
[Int[]]$a = 1..5
[Int[]]$b = 4..8
And use the [Linq.Enumerable]::
operator:
$Yellow = [Int[]][Linq.Enumerable]::Except($a, $b)
$Blue = [Int[]][Linq.Enumerable]::Except($b, $a)
$Green = [Int[]][Linq.Enumerable]::Intersect($a, $b)
$NotGreen = [Int[]]([Linq.Enumerable]::Except($a, $b) + [Linq.Enumerable]::Except($b, $a))
(Added 2022-05-02)
There is actually another way to get the symmetric difference which is using the SymmetricExceptWith
method of the HashSet
class, for a details see the specific answer from mklement0 on Find what is different in two very large lists:
$a = [System.Collections.Generic.HashSet[int]](1..5)
$b = [System.Collections.Generic.HashSet[int]](4..8)
$a.SymmetricExceptWith($b)
$NotGreen = $a # note that the result will be stored back in $a
(Updated 2022-05-02, thanks @Santiago for the improved benchmark script)
Benchmark results highly depend on the sizes of the collections and how many items there are actually shared. Besides there is a caveat with drawing conclussions on methods that use
lazy evaluation (also called deferred execution) as with LINQ and the SymmetricExceptWith
where actually pulling the result (e.g. @($a)[0]
) causes the expression to be evaluated and therefore might take longer than expected as nothing has been done yet other than defining what should be done. See also: Fastest Way to get a uniquely index item from the property of an array
Anyways, as an "average", I am presuming that half of each collection is shared with the other.
Test TotalMilliseconds
---- -----------------
Compare-Object 118.5942
Where-Object 275.6602
ForEach-Object 52.8875
foreach 25.7626
Linq 14.2044
SymmetricExce… 7.6329
To get a good performance comparison, caches should be cleared by e.g. starting a fresh PowerShell session.
[Int[]]$arrA = 1..1000
[Int[]]$arrB = 500..1500
Measure-Command {&{
$a = $arrA
$b = $arrB
Compare-Object -ReferenceObject $a -DifferenceObject $b -PassThru
}} |Select-Object @{N='Test';E={'Compare-Object'}}, TotalMilliseconds
Measure-Command {&{
$a = $arrA
$b = $arrB
($a | Where {$b -NotContains $_}), ($b | Where {$a -NotContains $_})
}} |Select-Object @{N='Test';E={'Where-Object'}}, TotalMilliseconds
Measure-Command {&{
$a = $arrA
$b = $arrB
$Count = @{}
$a + $b | ForEach-Object {$Count[$_] += 1}
$Count.Keys | Where-Object {$Count[$_] -eq 1}
}} |Select-Object @{N='Test';E={'ForEach-Object'}}, TotalMilliseconds
Measure-Command {&{
$a = $arrA
$b = $arrB
$Count = @{}
ForEach ($Item in $a + $b) {$Count[$Item] += 1}
$Count.Keys.Where({$Count[$_] -eq 1}) # => should be foreach($key in $Count.Keys) {if($Count[$key] -eq 1) { $key }} for fairness
}} |Select-Object @{N='Test';E={'foreach'}}, TotalMilliseconds
Measure-Command {&{
$a = $arrA
$b = $arrB
[Int[]]([Linq.Enumerable]::Except($a, $b) + [Linq.Enumerable]::Except($b, $a))
}} |Select-Object @{N='Test';E={'Linq'}}, TotalMilliseconds
Measure-Command {&{
$a = $arrA
$b = $arrB
($r = [System.Collections.Generic.HashSet[int]]::new($a)).SymmetricExceptWith($b)
}} |Select-Object @{N='Test';E={'SymmetricExceptWith'}}, TotalMilliseconds
Upvotes: 132
Reputation: 683
Your results will not be helpful unless the arrays are first sorted. To sort an array, run it through Sort-Object.
$x = @(5,1,4,2,3)
$y = @(2,4,6,1,3,5)
Compare-Object -ReferenceObject ($x | Sort-Object) -DifferenceObject ($y | Sort-Object)
Upvotes: 4
Reputation: 3171
Try:
$a1=@(1,2,3,4,5)
$b1=@(1,2,3,4,5,6)
(Compare-Object $a1 $b1).InputObject
Or, you can use:
(Compare-Object $b1 $a1).InputObject
The order doesn't matter.
Upvotes: 3
Reputation: 126762
PS > $c = Compare-Object -ReferenceObject (1..5) -DifferenceObject (1..6) -PassThru
PS > $c
6
Upvotes: 121
Reputation: 4454
This should help, uses simple hash table.
$a1=@(1,2,3,4,5) $b1=@(1,2,3,4,5,6)
$hash= @{}
#storing elements of $a1 in hash
foreach ($i in $a1)
{$hash.Add($i, "present")}
#define blank array $c
$c = @()
#adding uncommon ones in second array to $c and removing common ones from hash
foreach($j in $b1)
{
if(!$hash.ContainsKey($j)){$c = $c+$j}
else {hash.Remove($j)}
}
#now hash is left with uncommon ones in first array, so add them to $c
foreach($k in $hash.keys)
{
$c = $c + $k
}
Upvotes: 1
Reputation: 29449
Look at Compare-Object
Compare-Object $a1 $b1 | ForEach-Object { $_.InputObject }
Or if you would like to know where the object belongs to, then look at SideIndicator:
$a1=@(1,2,3,4,5,8)
$b1=@(1,2,3,4,5,6)
Compare-Object $a1 $b1
Upvotes: 15