Vasiliy Pupkin
Vasiliy Pupkin

Reputation: 33

How use linq explict in powershell or analogue of "NOT IN" in SQL

I have a question about using Linq in PowerShell. I can not figure out how to correctly use the Except method

Example tables:

$Arr = 1..1000
$Props = ("employeeID","FindName1","FindName2")
$Table1 = New-Object System.Data.DataTable "Table1"
$Props | ForEach-Object { $Table1.Columns.Add( $_ , [String]) | Out-Null }

ForEach ($Record in $Arr ) {
    $Row = $Table1.NewRow()
    $Row.employeeID = $Record.ToString("00000")
    $Row.FindName1 = "UserName_" + $Record.ToString()
    $Row.FindName2 = "String_" + $Record.ToString("00000000")
    $Table1.Rows.Add($Row)
}

$Arr2 = 980..1111
$Props = ("employeeID","FindName1")
$Table2 = New-Object System.Data.DataTable "Table2"
$Props | ForEach-Object { $Table2.Columns.Add( $_ , [String]) | Out-Null }

ForEach ($Record in $Arr2 ) {
    $Row = $Table2.NewRow()
    $Row.employeeID = $Record.ToString("00000")
    $Row.FindName1 = "UserName_" + $Record.ToString()
    $Table2.Rows.Add($Row)
}

As a result of the work, I want to get records from the $table1 where FindName1 not in $Table2.FindName1, preserving all the headers

An attempt to perform does not produce the expected result.

$ExceptOut = [System.Linq.Enumerable]::Except($Table1.FindName1, $Table2.FindName1)

As I understood from the article , i need to create my own class with methods that allow me to use a LINQ in the tables. But I am extremely far from programming. Or maybe there is some other fast analogue of "NOT IN" in SQL. I hope for help. Thanks.

Upvotes: 3

Views: 1789

Answers (2)

mklement0
mklement0

Reputation: 439193

To complement the LINQ-based answer with a native PowerShell solution:

The Compare-Object cmdlet allows you to compare collections, but note that while it is more concise, it is also much slower than the LINQ-based solution:

Compare-Object -PassThru -Property FindName1 `
  ([Data.DataRow[]] $Table1.Rows) `
  ([Data.DataRow[]] $Table2.Rows) | Where-Object SideIndicator -eq '<='
  • Casting [Data.DataRow[]] - which creates a new array from the rows collection - is seemingly needed for Compare-Object to recognize the rows as an enumerable.

    • Calling .GetEnumerator() or casting to Collections.IEnumerable doesn't help, and casting to Collections.Generic.IEnumerable[Data.DataRow]] fails.
  • -Property FindName1 specifies the comparison property, i.e., the property to compare the rows by.

  • -PassThru is needed to make Compare-Object output the input objects as-is, instead of custom objects that contain only the property/ies specified with -Property.

    • Note that the objects are decorated with a .SideIndicator NoteProperty member, however, using PowerShell's ETS (extended type system) - see below.
  • Given that Compare-Object outputs input objects that are unique to either collection, Where-Object SideIndicator -eq '<=' must be used to limit the results to those difference objects that are unique to the LHS input collection (which is signaled via a .SideIndicator property value of '<=' - the arrow points to the side the object is unique to).

GitHub issue #4316 proposes a number of improvements to the Compare-Object cmdlet, which could help simplify and speed up the above solution.
That said, the proposal to make LINQ a first-class PowerShell citizen, #2226, holds much more promise.

Upvotes: 1

mklement0
mklement0

Reputation: 439193

For the (generic) set-difference .Except() LINQ method to work, the two enumerables (IEnumerable<T>) passed as arguments must:

  • enumerate instances of the same type T

    • However, it is possible to use Object for T, thereby effectively supporting PowerShell's potentially mixed-type regular Object[] arrays ([object[]] in PowerShell's type-literal notation).
  • and, if that type is a reference type whose instance should compare meaningfully based on the content of instances (rather than by mere reference equality, i.e. identity), must implement the IEquatable<T> interface and/or override the .Equals() method (and therefore also the .GetHashCode() method).

PowerShell is seemingly not able to find the right overload for .Except() with the [object[]] arrays returned by $Table1.FindName1 and $Table2.FindName1 (also see the note re v7.3+ below), though these arrays technically fulfill the above requirements - I don't know why.

However, simply casting one of these arrays to what it already is - [object[]] - solves the problem:

[Linq.Enumerable]::Except([object[]] $Table1.FindName1, $Table2.FindName1)

Note:

  • As shown above, it is sufficient to cast one of the enumerables to [object[]] in order for PowerShell to infer the correct generic type argument for the method.

    • In PowerShell (Core) 7.3+ it is now possible to call generic methods with explicit type arguments (see about_Calling_Generic_Methods), which allows simplifying the solution to:

      # Note the '[object]' right after 'Except',
      # specifying the generic type argument.
      [Linq.Enumerable]::Except[object]($Table1.FindName1, $Table2.FindName1)
      
  • Given that the .FindName1 column ultimately contains strings, you can also cast - both - enumerables to [string[]], though this implicitly creates a copy of each array, which is unnecessary here.


Now if you want to return whole rows while using the .FindName1 column only for comparison, things get much more complex:

  • You must implement a custom comparer class that implements the IEqualityComparer[T]interface.

  • You must cast the .Rows collection of the data tables to IEnumerable[DataRow], which requires calling the System.Linq.Enumerable.Cast() method via reflection (again, see further below for a simpler v7.3+ solution).

    • Note: While you could directly cast to [DataRow[]], this would involve inefficient conversion of the rows collection to an array.

Here's a PSv5+ solution that implements the custom comparer class as a PowerShell class:

# A custom comparer class that compares two DataRow instances by their
# .FindName1 column.
class CustomTableComparer : Collections.Generic.IEqualityComparer[Data.DataRow] {
  [bool] Equals([Data.DataRow] $x, [Data.DataRow] $y) {
    return [string]::Equals($x.FindName1, $y.FindName1, 'Ordinal')
  }
  [int] GetHashCode([Data.DataRow] $row) {
    # Note: Any two rows for which Equals() returns $true must return the same
    #       hash code. Because *ordinal, case-sensitive* string comparison is
    #       used above, it's sufficient to simply call .GetHashCode() on
    #       the .FindName1 property value, but that would have to be tweaked
    #       for other types of string comparisons.
    return $row.FindName1.GetHashCode();
  }
}


# Use reflection to get a reference to a .Cast() method instantiation 
# that casts to IEnumerable<DataRow>.
$toIEnumerable = [Linq.Enumerable].GetMethod('Cast').MakeGenericMethod([Data.DataRow])

# Call .Except() with the casts and the custom comparer.
# Note the need to wrap the .Rows value in an aux. single-element
# array - (, ...) - for it to be treated as a single argument.
[Linq.Enumerable]::Except(
    $toIEnumerable.Invoke($null, (, $Table1.Rows)), 
    $toIEnumerable.Invoke($null, (, $Table2.Rows)), 
    [CustomTableComparer]::new()
)

PowerShell 7.3+ simplification:

The ability to specify generic method type arguments directly makes the reflection-based approach unnecessary and simplifies the approach:

[Linq.Enumerable]::Except[Data.DataRow](
  [Linq.Enumerable]::Cast[Data.DataRow]($Table1.Rows), 
  [Linq.Enumerable]::Cast[Data.DataRow]($Table2.Rows), 
  [CustomTableComparer]::new()
)

Note: The [Linq.Enumerable]::Cast[Data.DataRow]() calls are still needed, because the .Rows property of System.Data.DataTable instances implements only IEnumerable, not also IEnumerable[System.Data.DataRow].


GitHub issue #2226 proposes making LINQ a first-class PowerShell citizen.

Upvotes: 6

Related Questions