Andy
Andy

Reputation: 3

Powershell Help: How can I remove duplicates (using multiple columns simultaneously, not sequentially)?

I have tried several different variations based on some other stack overflow articles, but I will share a sample of what I have and a sample output and then some cobbled-together code hoping for some direction from the community:

C:\Scripts\contacts.csv:

id,first_name,last_name,email
1,john,smith,[email protected]
1,jane,smith,[email protected]
2,jane,smith,[email protected]
2,john,smith,[email protected]
3,sam,jones,[email protected]
3,sandy,jones,[email protected]

Need to turn this into a file where column "email" is unique to column "id". In other words there can be duplicate addresses, but only if there is a different id.

desired output C:\Scripts\contacts-trimmed.csv:

id,first_name,last_name,email
1,john,smith,[email protected]
2,john,smith,[email protected]
3,sam,jones,[email protected]
3,sandy,jones,[email protected]

I have tried this with a few different variations:

Import-Csv C:\Scripts\contacts.csv | sort first_name | Sort-Object -Property id,email -Unique | Export-Csv C:\Scripts\contacts-trim.csv -NoTypeInformation

Any help or direction would be most appreciated

Upvotes: 0

Views: 740

Answers (1)

Mathias R. Jessen
Mathias R. Jessen

Reputation: 174545

You'll want to use the Group-Object cmdlet, to, well, group together records with similar values:

$records = @'
id,first_name,last_name,email
1,john,smith,[email protected]
1,jane,smith,[email protected]
2,jane,smith,[email protected]
2,john,smith,[email protected]
3,sam,jones,[email protected]
3,sandy,jones,[email protected]
'@ |ConvertFrom-Csv

# group records based on id and email column
$records |Group-Object id,email |ForEach-Object {
  # grab only the first record from each group
  $_.Group |Select-Object -First 1
} |Export-Csv .\no_duplicates.csv -NoTypeInformation

Upvotes: 1

Related Questions