Powershell Help: How can I remove duplicates (using multiple columns simultaneously, not sequentially)?

Question

I have tried several different variations based on some other stack overflow articles, but I will share a sample of what I have and a sample output and then some cobbled-together code hoping for some direction from the community:

C:\Scripts\contacts.csv:

id,first_name,last_name,email
1,john,smith,jsmith@notreal.com
1,jane,smith,jsmith@notreal.com
2,jane,smith,jsmith@notreal.com
2,john,smith,jsmith@notreal.com
3,sam,jones,sjones@notreal.com
3,sandy,jones,sandy@notreal.com

Need to turn this into a file where column "email" is unique to column "id". In other words there can be duplicate addresses, but only if there is a different id.

desired output C:\Scripts\contacts-trimmed.csv:

id,first_name,last_name,email
1,john,smith,jsmith@notreal.com
2,john,smith,jsmith@notreal.com
3,sam,jones,sjones@notreal.com
3,sandy,jones,sandy@notreal.com

I have tried this with a few different variations:

Import-Csv C:\Scripts\contacts.csv | sort first_name | Sort-Object -Property id,email -Unique | Export-Csv C:\Scripts\contacts-trim.csv -NoTypeInformation

Any help or direction would be most appreciated

Mathias R. Jessen · Accepted Answer

You'll want to use the Group-Object cmdlet, to, well, group together records with similar values:

$records = @'
id,first_name,last_name,email
1,john,smith,jsmith@notreal.com
1,jane,smith,jsmith@notreal.com
2,jane,smith,jsmith@notreal.com
2,john,smith,jsmith@notreal.com
3,sam,jones,sjones@notreal.com
3,sandy,jones,sandy@notreal.com
'@ |ConvertFrom-Csv

# group records based on id and email column
$records |Group-Object id,email |ForEach-Object {
  # grab only the first record from each group
  $_.Group |Select-Object -First 1
} |Export-Csv .
o_duplicates.csv -NoTypeInformation

Powershell Help: How can I remove duplicates (using multiple columns simultaneously, not sequentially)?

Answers (1)

Related Questions