immobile2
immobile2

Reputation: 569

Count or Find Unique Values for a specific key

I feel like this shouldn't be that hard, but I'm having trouble getting a data structure in mind that will give me what I want. I have a large amount of data and I need to find the instances where there are multiple Secondary Identifiers as defined below.

Primary Identifier,Secondary Identifier,Fruit
11111,1,apple
11111,1,pear
22222,1,banana
22222,1,grapefruit
33333,1,apple
33333,1,pear
33333,2,apple
33333,2,orange

That might not be a great example to use - but basically only two of the columns matter. What I'd really like is to return the Primary Identifiers where the unique count of Secondary Identifiers is greater than 1. So I'm thinking maybe a HashTable would be my best bet, but I tried to doing something in a pipeline oriented way and failed so I'm wondering if there is an easier method or Cmdlet that I haven't tried.

The final array (or hashtable) would be something like this:

ID      Count of Secondary ID
-----   ---------------------
11111   1
22222   1
33333   2

At that point, getting the instances of multiple would be as easy as $array | Where-Object {$_."Count of Secondary ID" -gt 1}

If this example sucks or what I'm after doesn't make sense, let me know and I can rewrite it; but it's almost like I need an implementation of Select-Object -Unique that would allow you to use two or more input objects/columns. Basically the same as Excel's remove duplicates and then selecting which headers to include. Except there are too many rows to open in Excel

Upvotes: 1

Views: 1448

Answers (1)

Mathias R. Jessen
Mathias R. Jessen

Reputation: 174515

Use Group-Object twice - first to group the objects by common Primary Identifier, then use Group-Object again to count the number of distinct Secondary Identifier's within each group:

$data = @'
Primary Identifier,Secondary Identifier,Fruit
11111,1,apple
11111,1,pear
22222,1,banana
22222,1,grapefruit
33333,1,apple
33333,1,pear
33333,2,apple
33333,2,orange
'@ |ConvertFrom-Csv 

$data |Group-Object 'Primary Identifier' |ForEach-Object {
  [pscustomobject]@{
    # Primary Identifier value will be the name of the group, since that's what we grouped by
    'Primary Identifier' = $_.Name
    # Use `Group-Object -NoElement` to count unique values - you could also use `Sort-Object -Unique`
    'Count of distinct Secondary Identifiers' = @($_.Group |Group-Object 'Secondary Identifier' -NoElement).Count
  }
}

Upvotes: 2

Related Questions