MaxRax
MaxRax

Reputation: 23

PowerShell: Intersection of more than two arrays

Using PowerShell, I have 14 arrays of strings. Some of the arrays are empty. How would I get the intersection (all elements that exist in all of the arrays) of these arrays (excluding the arrays that are empty)? I am trying to avoid comparing two arrays at a time.

Some of the arrays are empty, so I do not want to include those in my comparisons. Any ideas on how I would approach this? Thank you.

$a = @('hjiejnfnfsd','test','huiwe','test2')
$b = @('test','jnfijweofnew','test2')
$c = @('njwifqbfiwej','test','jnfijweofnew','test2')
$d = @('bhfeukefwgu','test','dasdwdv','test2','hfuweihfei')
$e = @('test','ddwadfedgnh','test2')
$f = @('test','test2')
$g = @('test','bjiewbnefw','test2')
$h = @('uie287278hfjf','test','huiwhiwe','test2')
$i = @()
$j = @()
$k = @('jireohngi','test','gu7y8732hbj','test2')
$l = @()
$m = @('test','test2')
$n = @('test','test2')

My attempt to solve this (although it does not check for empty arrays):

$overlap = Compare-Object $a $b -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $c -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $d -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $e -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $f -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $g -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $h -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $i -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $j -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $k -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $l -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $m -PassThru -IncludeEqual -ExcludeDifferent
$overlap = Compare-Object $overlap $n -PassThru -IncludeEqual -ExcludeDifferent

My desired result is that test and test2 appear in $overlap. This solution does not work because it does not check if the array it is comparing is empty.

Upvotes: 2

Views: 683

Answers (3)

mklement0
mklement0

Reputation: 438283

Note: The following assumes that no individual array contains the same string more than once (more work would be needed to address that).

$a = @('hjiejnfnfsd','test','huiwe','test2')
$b = @('test','jnfijweofnew','test2')
$c = @('njwifqbfiwej','test','jnfijweofnew','test2')
$d = @('bhfeukefwgu','test','dasdwdv','test2','hfuweihfei')
$e = @('test','ddwadfedgnh','test2')
$f = @('test','test2')
$g = @('test','bjiewbnefw','test2')
$h = @('uie287278hfjf','test','huiwhiwe','test2')
$i = @()
$j = @()
$k = @('jireohngi','test','gu7y8732hbj','test2')
$l = @()
$m = @('test','test2')
$n = @('test','test2')

$allArrays = $a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n

# Initialize a hashtable in which we'll keep
# track of unique strings and how often they occur.
$ht = @{}

# Loop over all arrays.
$nonEmptyArrayCount = 0
foreach ($arr in $allArrays) {
  # Loop over each non-empty array's elements.
  if ($arr.Count -gt 0) {
    ++$nonEmptyArrayCount 
    foreach ($el in $arr) {
      # Add each string and increment its occurrence count.
      $ht[$el] += 1
    }
  }
}

# Output all strings that occurred in every non-empty array
$ht.GetEnumerator() |
  Where-Object Value -eq $nonEmptyArrayCount |
  ForEach-Object Key

The above outputs those strings that are present in all of the non-empty input arrays:

test2
test

Upvotes: 2

zett42
zett42

Reputation: 27766

Here is a solution using a Hashset. A Hashset is a collection that stores only unique items and provides fast lookup, which makes it a good choice for intersection calculation. It even has a method IntersectWith which accepts any enumerable type (such as an array) as argument. The method modifies the original Hashset so that it contains only the elements which are contained in both the Hashset and the argument passed to the method.

# Test input
$a = @()     # I changed this to empty array for demonstration purposes
$b = @('test','jnfijweofnew','test2')
$c = @('njwifqbfiwej','test','jnfijweofnew','test2')
$d = @('bhfeukefwgu','test','dasdwdv','test2','hfuweihfei')
$e = @('test','ddwadfedgnh','test2')
$f = @('test','test2')
$g = @('test','bjiewbnefw','test2')
$h = @('uie287278hfjf','test','huiwhiwe','test2')
$i = @()
$j = @()
$k = @('jireohngi','test','gu7y8732hbj','test2')
$l = @()
$m = @('test','test2')
$n = @('test','test2')

# Create a variable with a type-constraint
[Collections.Generic.Hashset[object]] $overlap = $null

# For each of the arrays...
($a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n).
    Where{ $_.Count -gt 0 }.           #... except the empty ones
    ForEach{
        # If the Hashset has not been initialized yet
        if( $null -eq $overlap ) {
            # Create the initial hashset from the first non-empty array.
            $overlap = $_
        }
        else { 
            # Hashset is already initialized, calculate the intersection with next non-empty array.
            $overlap.IntersectWith( $_ )
        }
    }

$overlap  # Output

Output:

test
test2

Remarks:

  • To filter out empty arrays (or in general any kind of collection), we check its Count member, which gives the number of elements.

  • .Foreach and .Where are PowerShell intrinsic methods. These can be faster than the ForEach-Object and Where-Object commands, especially when working directly with collections (as opposed to output of another command). The automatic variable $_ represents the current element of the collection, as usual.

    This code using pipeline commands is functionally the same:

    [Collections.Generic.Hashset[object]] $overlap = $null
    
    $a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n |
        Where-Object Count -gt 0 |           
        ForEach-Object{
            if( $null -eq $overlap ) {  
                $overlap = $_ 
            }
            else { 
                $overlap.IntersectWith( $_ )   
            }
        }
    
  • In if( $null -eq $overlap ) it is very important that $null is on the left-hand-side of the -eq operator. If it were on the right-hand-side, PowerShell would do an element-wise comparison with $null to filter elements instead of checking if the variable $overlap itself is $null (see About Comparison Operators)

  • In the line $overlap = $_ PowerShell automatically converts the current array into a Hashset, because we have set a type constraint using [Collections.Generic.Hashset[object]] $overlap before and array is convertible to Hashset (see About Variables).

  • String comparison of a Hashset is case-sensitive by default. To make it case-insensitive, convert each string to lowercase like this:

    [Collections.Generic.Hashset[object]] $overlap = $null
    
    ($a, $b, $c, $d, $e, $f, $g, $h, $i, $j, $k, $l, $m, $n).
        Where{ $_.Count -gt 0 }.
        ForEach{
            if( $null -eq $overlap ) {
                $overlap = $_.ToLower()
            }
            else { 
                $overlap.IntersectWith( $_.ToLower() )
            }
        }
    

    This uses member access enumeration to call the String.ToLower() method for each element of the input arrays.

  • With the first variant, inserting a linebreak before Where and ForEach is not really necessary, but improves code readability (note that you can't insert a linebreak before .Where and .ForEach, because this confuses the PowerShell parser).

Upvotes: 2

Lance U. Matthews
Lance U. Matthews

Reputation: 16606

You're close. Excluding empty arrays from comparison is essential because the intersection of an empty array and any other array is an empty array, and once $overlap contains an empty array that will be the final result regardless of what subsequent arrays contain.

Here's your code with the non-empty check and rewritten using loops...

$a = @('hjiejnfnfsd', 'test', 'huiwe', 'test2')
$b = @('test', 'jnfijweofnew', 'test2')
$c = @('njwifqbfiwej', 'test', 'jnfijweofnew', 'test2')
$d = @('bhfeukefwgu', 'test', 'dasdwdv', 'test2', 'hfuweihfei')
$e = @('test', 'ddwadfedgnh', 'test2')
$f = @('test', 'test2')
$g = @('test', 'bjiewbnefw', 'test2')
$h = @('uie287278hfjf', 'test', 'huiwhiwe', 'test2')
$i = @()
$j = @()
$k = @('jireohngi', 'test', 'gu7y8732hbj', 'test2')
$l = @()
$m = @('test', 'test2')
$n = @('test', 'test2')

# Create an array of arrays $a through $n
$arrays = @(
    # 'a'..'n' doesn't work in Windows PowerShell
    # Define both ends of the range...
    #             'a'    → [String]
    #             'a'[0] → [Char]
    #     [Int32] 'a'[0] → 97 (ASCII a)
    # ...and cast each element back to a [Char]
    [Char[]] ([Int32] 'a'[0]..[Int32] 'n'[0]) |
        Get-Variable -ValueOnly
)

# Initialize $overlap to the first non-empty array
for ($initialOverlapIndex = 0; $initialOverlapIndex -lt $arrays.Length; $initialOverlapIndex++)
{
    if ($arrays[$initialOverlapIndex].Length -gt 0)
    {
        break;
    }
}
<#
    Alternative:
        $initialOverlapIndex = [Array]::FindIndex(
            $arrays,
            [Predicate[Array]] { param($array) $array.Length -gt 0 }
        )
#>
$overlap = $arrays[$initialOverlapIndex]

for ($comparisonIndex = $initialOverlapIndex + 1; $comparisonIndex -lt $arrays.Length; $comparisonIndex++)
# Alternative: foreach ($array in $arrays | Select-Object -Skip $initialOverlapIndex)
{
    $array = $arrays[$comparisonIndex]
    if ($array.Length -gt 0)
    {
        $overlap = Compare-Object $overlap $array -PassThru -IncludeEqual -ExcludeDifferent
    }
}

$overlap

...which outputs...

test
test2

Upvotes: 2

Related Questions