Reputation: 23
Using PowerShell, I have 14 arrays of strings. Some of the arrays are empty. How would I get the intersection (all elements that exist in all of the arrays) of these arrays (excluding the arrays that are empty)? I am trying to avoid comparing two arrays at a time.
Some of the arrays are empty, so I do not want to include those in my comparisons. Any ideas on how I would approach this? Thank you.
$a = @('hjiejnfnfsd','test','huiwe','test2')
$b = @('test','jnfijweofnew','test2')
$c = @('njwifqbfiwej','test','jnfijweofnew','test2')
$d = @('bhfeukefwgu','test','dasdwdv','test2','hfuweihfei')
$e = @('test','ddwadfedgnh','test2')
$f = @('test','test2')
$g = @('test','bjiewbnefw','test2')
$h = @('uie287278hfjf','test','huiwhiwe','test2')
$i = @()
$j = @()
$k = @('jireohngi','test','gu7y8732hbj','test2')
$l = @()
$m = @('test','test2')
$n = @('test','test2')
My attempt to solve this (although it does not check for empty arrays):
$overlap = Compare-Object $a $b -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $c -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $d -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $e -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $f -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $g -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $h -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $i -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $j -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $k -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $l -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $m -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $n -PassThru -IncludeEqual -ExcludeDifferent
My desired result is that test and test2 appear in $overlap. This solution does not work because it does not check if the array it is comparing is empty.
Upvotes: 2
Views: 683
Reputation: 438283
Note: The following assumes that no individual array contains the same string more than once (more work would be needed to address that).
$a = @('hjiejnfnfsd','test','huiwe','test2')
$b = @('test','jnfijweofnew','test2')
$c = @('njwifqbfiwej','test','jnfijweofnew','test2')
$d = @('bhfeukefwgu','test','dasdwdv','test2','hfuweihfei')
$e = @('test','ddwadfedgnh','test2')
$f = @('test','test2')
$g = @('test','bjiewbnefw','test2')
$h = @('uie287278hfjf','test','huiwhiwe','test2')
$i = @()
$j = @()
$k = @('jireohngi','test','gu7y8732hbj','test2')
$l = @()
$m = @('test','test2')
$n = @('test','test2')
$allArrays = $a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n
# Initialize a hashtable in which we'll keep
# track of unique strings and how often they occur.
$ht = @{}
# Loop over all arrays.
$nonEmptyArrayCount = 0
foreach ($arr in $allArrays) {
# Loop over each non-empty array's elements.
if ($arr.Count -gt 0) {
++$nonEmptyArrayCount
foreach ($el in $arr) {
# Add each string and increment its occurrence count.
$ht[$el] += 1
}
}
}
# Output all strings that occurred in every non-empty array
$ht.GetEnumerator() |
Where-Object Value -eq $nonEmptyArrayCount |
ForEach-Object Key
The above outputs those strings that are present in all of the non-empty input arrays:
test2
test
Upvotes: 2
Reputation: 27766
Here is a solution using a Hashset
. A Hashset
is a collection that stores only unique items and provides fast lookup, which makes it a good choice for intersection calculation. It even has a method IntersectWith
which accepts any enumerable type (such as an array) as argument. The method modifies the original Hashset
so that it contains only the elements which are contained in both the Hashset
and the argument passed to the method.
# Test input
$a = @() # I changed this to empty array for demonstration purposes
$b = @('test','jnfijweofnew','test2')
$c = @('njwifqbfiwej','test','jnfijweofnew','test2')
$d = @('bhfeukefwgu','test','dasdwdv','test2','hfuweihfei')
$e = @('test','ddwadfedgnh','test2')
$f = @('test','test2')
$g = @('test','bjiewbnefw','test2')
$h = @('uie287278hfjf','test','huiwhiwe','test2')
$i = @()
$j = @()
$k = @('jireohngi','test','gu7y8732hbj','test2')
$l = @()
$m = @('test','test2')
$n = @('test','test2')
# Create a variable with a type-constraint
[Collections.Generic.Hashset[object]] $overlap = $null
# For each of the arrays...
($a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n).
Where{ $_.Count -gt 0 }. #... except the empty ones
ForEach{
# If the Hashset has not been initialized yet
if( $null -eq $overlap ) {
# Create the initial hashset from the first non-empty array.
$overlap = $_
}
else {
# Hashset is already initialized, calculate the intersection with next non-empty array.
$overlap.IntersectWith( $_ )
}
}
$overlap # Output
Output:
test
test2
Remarks:
To filter out empty arrays (or in general any kind of collection), we check its Count
member, which gives the number of elements.
.Foreach
and .Where
are PowerShell intrinsic methods. These can be faster than the ForEach-Object
and Where-Object
commands, especially when working directly with collections (as opposed to output of another command). The automatic variable $_
represents the current element of the collection, as usual.
This code using pipeline commands is functionally the same:
[Collections.Generic.Hashset[object]] $overlap = $null
$a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n |
Where-Object Count -gt 0 |
ForEach-Object{
if( $null -eq $overlap ) {
$overlap = $_
}
else {
$overlap.IntersectWith( $_ )
}
}
In if( $null -eq $overlap )
it is very important that $null
is on the left-hand-side of the -eq
operator. If it were on the right-hand-side, PowerShell would do an element-wise comparison with $null
to filter elements instead of checking if the variable $overlap
itself is $null
(see About Comparison Operators)
In the line $overlap = $_
PowerShell automatically converts the current array into a Hashset
, because we have set a type constraint using [Collections.Generic.Hashset[object]] $overlap
before and array
is convertible to Hashset
(see About Variables).
String comparison of a Hashset
is case-sensitive by default. To make it case-insensitive, convert each string to lowercase like this:
[Collections.Generic.Hashset[object]] $overlap = $null
($a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n).
Where{ $_.Count -gt 0 }.
ForEach{
if( $null -eq $overlap ) {
$overlap = $_.ToLower()
}
else {
$overlap.IntersectWith( $_.ToLower() )
}
}
This uses member access enumeration to call the String.ToLower()
method for each element of the input arrays.
With the first variant, inserting a linebreak before Where
and ForEach
is not really necessary, but improves code readability (note that you can't insert a linebreak before .Where
and .ForEach
, because this confuses the PowerShell parser).
Upvotes: 2
Reputation: 16606
You're close. Excluding empty arrays from comparison is essential because the intersection of an empty array and any other array is an empty array, and once $overlap
contains an empty array that will be the final result regardless of what subsequent arrays contain.
Here's your code with the non-empty check and rewritten using loops...
$a = @('hjiejnfnfsd', 'test', 'huiwe', 'test2')
$b = @('test', 'jnfijweofnew', 'test2')
$c = @('njwifqbfiwej', 'test', 'jnfijweofnew', 'test2')
$d = @('bhfeukefwgu', 'test', 'dasdwdv', 'test2', 'hfuweihfei')
$e = @('test', 'ddwadfedgnh', 'test2')
$f = @('test', 'test2')
$g = @('test', 'bjiewbnefw', 'test2')
$h = @('uie287278hfjf', 'test', 'huiwhiwe', 'test2')
$i = @()
$j = @()
$k = @('jireohngi', 'test', 'gu7y8732hbj', 'test2')
$l = @()
$m = @('test', 'test2')
$n = @('test', 'test2')
# Create an array of arrays $a through $n
$arrays = @(
# 'a'..'n' doesn't work in Windows PowerShell
# Define both ends of the range...
# 'a' → [String]
# 'a'[0] → [Char]
# [Int32] 'a'[0] → 97 (ASCII a)
# ...and cast each element back to a [Char]
[Char[]] ([Int32] 'a'[0]..[Int32] 'n'[0]) |
Get-Variable -ValueOnly
)
# Initialize $overlap to the first non-empty array
for ($initialOverlapIndex = 0; $initialOverlapIndex -lt $arrays.Length; $initialOverlapIndex++)
{
if ($arrays[$initialOverlapIndex].Length -gt 0)
{
break;
}
}
<#
Alternative:
$initialOverlapIndex = [Array]::FindIndex(
$arrays,
[Predicate[Array]] { param($array) $array.Length -gt 0 }
)
#>
$overlap = $arrays[$initialOverlapIndex]
for ($comparisonIndex = $initialOverlapIndex + 1; $comparisonIndex -lt $arrays.Length; $comparisonIndex++)
# Alternative: foreach ($array in $arrays | Select-Object -Skip $initialOverlapIndex)
{
$array = $arrays[$comparisonIndex]
if ($array.Length -gt 0)
{
$overlap = Compare-Object $overlap $array -PassThru -IncludeEqual -ExcludeDifferent
}
}
$overlap
...which outputs...
test
test2
Upvotes: 2