Reputation: 33
I have a question about using Linq in PowerShell. I can not figure out how to correctly use the Except
method
Example tables:
$Arr = 1..1000
$Props = ("employeeID","FindName1","FindName2")
$Table1 = New-Object System.Data.DataTable "Table1"
$Props | ForEach-Object { $Table1.Columns.Add( $_ , [String]) | Out-Null }
ForEach ($Record in $Arr ) {
$Row = $Table1.NewRow()
$Row.employeeID = $Record.ToString("00000")
$Row.FindName1 = "UserName_" + $Record.ToString()
$Row.FindName2 = "String_" + $Record.ToString("00000000")
$Table1.Rows.Add($Row)
}
$Arr2 = 980..1111
$Props = ("employeeID","FindName1")
$Table2 = New-Object System.Data.DataTable "Table2"
$Props | ForEach-Object { $Table2.Columns.Add( $_ , [String]) | Out-Null }
ForEach ($Record in $Arr2 ) {
$Row = $Table2.NewRow()
$Row.employeeID = $Record.ToString("00000")
$Row.FindName1 = "UserName_" + $Record.ToString()
$Table2.Rows.Add($Row)
}
As a result of the work, I want to get records from the $table1
where FindName1 not in $Table2.FindName1
, preserving all the headers
An attempt to perform does not produce the expected result.
$ExceptOut = [System.Linq.Enumerable]::Except($Table1.FindName1, $Table2.FindName1)
As I understood from the article , i need to create my own class with methods that allow me to use a LINQ in the tables. But I am extremely far from programming. Or maybe there is some other fast analogue of "NOT IN"
in SQL.
I hope for help. Thanks.
Upvotes: 3
Views: 1789
Reputation: 439193
To complement the LINQ-based answer with a native PowerShell solution:
The Compare-Object
cmdlet allows you to compare collections, but note that while it is more concise, it is also much slower than the LINQ-based solution:
Compare-Object -PassThru -Property FindName1 `
([Data.DataRow[]] $Table1.Rows) `
([Data.DataRow[]] $Table2.Rows) | Where-Object SideIndicator -eq '<='
Casting [Data.DataRow[]]
- which creates a new array from the rows collection - is seemingly needed for Compare-Object
to recognize the rows as an enumerable.
.GetEnumerator()
or casting to Collections.IEnumerable
doesn't help, and casting to Collections.Generic.IEnumerable[Data.DataRow]]
fails.-Property FindName1
specifies the comparison property, i.e., the property to compare the rows by.
-PassThru
is needed to make Compare-Object
output the input objects as-is, instead of custom objects that contain only the property/ies specified with -Property
.
.SideIndicator
NoteProperty member, however, using PowerShell's ETS (extended type system) - see below.Given that Compare-Object
outputs input objects that are unique to either collection, Where-Object SideIndicator -eq '<='
must be used to limit the results to those difference objects that are unique to the LHS input collection (which is signaled via a .SideIndicator
property value of '<='
- the arrow points to the side the object is unique to).
GitHub issue #4316 proposes a number of improvements to the Compare-Object
cmdlet, which could help simplify and speed up the above solution.
That said, the proposal to make LINQ a first-class PowerShell citizen, #2226, holds much more promise.
Upvotes: 1
Reputation: 439193
For the (generic) set-difference .Except()
LINQ method to work, the two enumerables (IEnumerable<T>
) passed as arguments must:
enumerate instances of the same type T
Object
for T
, thereby effectively supporting PowerShell's potentially mixed-type regular Object[]
arrays ([object[]]
in PowerShell's type-literal notation).and, if that type is a reference type whose instance should compare meaningfully based on the content of instances (rather than by mere reference equality, i.e. identity), must implement the IEquatable<T>
interface and/or override the .Equals()
method (and therefore also the .GetHashCode()
method).
PowerShell is seemingly not able to find the right overload for .Except()
with the [object[]]
arrays returned by $Table1.FindName1
and $Table2.FindName1
(also see the note re v7.3+ below), though these arrays technically fulfill the above requirements - I don't know why.
However, simply casting one of these arrays to what it already is - [object[]]
- solves the problem:
[Linq.Enumerable]::Except([object[]] $Table1.FindName1, $Table2.FindName1)
Note:
As shown above, it is sufficient to cast one of the enumerables to [object[]]
in order for PowerShell to infer the correct generic type argument for the method.
In PowerShell (Core) 7.3+ it is now possible to call generic methods with explicit type arguments (see about_Calling_Generic_Methods), which allows simplifying the solution to:
# Note the '[object]' right after 'Except',
# specifying the generic type argument.
[Linq.Enumerable]::Except[object]($Table1.FindName1, $Table2.FindName1)
Given that the .FindName1
column ultimately contains strings, you can also cast - both - enumerables to [string[]]
, though this implicitly creates a copy of each array, which is unnecessary here.
Now if you want to return whole rows while using the .FindName1
column only for comparison, things get much more complex:
You must implement a custom comparer class that implements the IEqualityComparer[T]
interface.
You must cast the .Rows
collection of the data tables to IEnumerable[DataRow]
, which requires calling the System.Linq.Enumerable.Cast()
method via reflection (again, see further below for a simpler v7.3+ solution).
[DataRow[]]
, this would involve inefficient conversion of the rows collection to an array.Here's a PSv5+ solution that implements the custom comparer class as a PowerShell class:
# A custom comparer class that compares two DataRow instances by their
# .FindName1 column.
class CustomTableComparer : Collections.Generic.IEqualityComparer[Data.DataRow] {
[bool] Equals([Data.DataRow] $x, [Data.DataRow] $y) {
return [string]::Equals($x.FindName1, $y.FindName1, 'Ordinal')
}
[int] GetHashCode([Data.DataRow] $row) {
# Note: Any two rows for which Equals() returns $true must return the same
# hash code. Because *ordinal, case-sensitive* string comparison is
# used above, it's sufficient to simply call .GetHashCode() on
# the .FindName1 property value, but that would have to be tweaked
# for other types of string comparisons.
return $row.FindName1.GetHashCode();
}
}
# Use reflection to get a reference to a .Cast() method instantiation
# that casts to IEnumerable<DataRow>.
$toIEnumerable = [Linq.Enumerable].GetMethod('Cast').MakeGenericMethod([Data.DataRow])
# Call .Except() with the casts and the custom comparer.
# Note the need to wrap the .Rows value in an aux. single-element
# array - (, ...) - for it to be treated as a single argument.
[Linq.Enumerable]::Except(
$toIEnumerable.Invoke($null, (, $Table1.Rows)),
$toIEnumerable.Invoke($null, (, $Table2.Rows)),
[CustomTableComparer]::new()
)
PowerShell 7.3+ simplification:
The ability to specify generic method type arguments directly makes the reflection-based approach unnecessary and simplifies the approach:
[Linq.Enumerable]::Except[Data.DataRow](
[Linq.Enumerable]::Cast[Data.DataRow]($Table1.Rows),
[Linq.Enumerable]::Cast[Data.DataRow]($Table2.Rows),
[CustomTableComparer]::new()
)
Note: The [Linq.Enumerable]::Cast[Data.DataRow]()
calls are still needed, because the .Rows
property of System.Data.DataTable
instances implements only IEnumerable
, not also IEnumerable[System.Data.DataRow]
.
GitHub issue #2226 proposes making LINQ a first-class PowerShell citizen.
Upvotes: 6