Reputation: 569
I feel like this shouldn't be that hard, but I'm having trouble getting a data structure in mind that will give me what I want. I have a large amount of data and I need to find the instances where there are multiple Secondary Identifiers
as defined below.
Primary Identifier,Secondary Identifier,Fruit
11111,1,apple
11111,1,pear
22222,1,banana
22222,1,grapefruit
33333,1,apple
33333,1,pear
33333,2,apple
33333,2,orange
That might not be a great example to use - but basically only two of the columns matter. What I'd really like is to return the Primary Identifiers
where the unique count of Secondary Identifiers
is greater than 1. So I'm thinking maybe a HashTable
would be my best bet, but I tried to doing something in a pipeline oriented way and failed so I'm wondering if there is an easier method or Cmdlet
that I haven't tried.
The final array
(or hashtable
) would be something like this:
ID Count of Secondary ID
----- ---------------------
11111 1
22222 1
33333 2
At that point, getting the instances of multiple would be as easy as $array | Where-Object {$_."Count of Secondary ID" -gt 1}
If this example sucks or what I'm after doesn't make sense, let me know and I can rewrite it; but it's almost like I need an implementation of Select-Object -Unique
that would allow you to use two or more input objects/columns. Basically the same as Excel
's remove duplicates and then selecting which headers to include. Except there are too many rows to open in Excel
Upvotes: 1
Views: 1448
Reputation: 174515
Use Group-Object
twice - first to group the objects by common Primary Identifier
, then use Group-Object
again to count the number of distinct Secondary Identifier
's within each group:
$data = @'
Primary Identifier,Secondary Identifier,Fruit
11111,1,apple
11111,1,pear
22222,1,banana
22222,1,grapefruit
33333,1,apple
33333,1,pear
33333,2,apple
33333,2,orange
'@ |ConvertFrom-Csv
$data |Group-Object 'Primary Identifier' |ForEach-Object {
[pscustomobject]@{
# Primary Identifier value will be the name of the group, since that's what we grouped by
'Primary Identifier' = $_.Name
# Use `Group-Object -NoElement` to count unique values - you could also use `Sort-Object -Unique`
'Count of distinct Secondary Identifiers' = @($_.Group |Group-Object 'Secondary Identifier' -NoElement).Count
}
}
Upvotes: 2