Reputation: 21
I’m trying to work with some duplicate data in Powershell and combine, I have an array with two fields Peoplename and Sitelabel the data looks like this
Peoplename Sitelabel
Bill Jones SL1, SL2, SL3
John Smith SL1, SL2, SL4
Claire Harris SL1, SL2, SL3
Fran Goddard SL1, SL4, SL3
John Smith SL1, SL2, SL3, SL4, SL5
Bill Jones SL1, SL2, SL3
John Smith SL1, SL2, SL3
I would like to end up with the following data
People Sitelabel
Bill Jones SL1, SL2, SL3, SL5, SL6, SL7
John Smith SL1, SL2, SL4, SL5
Claire Harris SL1, SL2, SL3
Fran Goddard SL1, SL4, SL3
I have used the following code to identify the duplicate People
for([int] $i=0;$i -lt $People.Peoplename.count;$i++)
{
for([int] $j=$i+1;$j -lt $ People.Peoplename.count;$j++)
{
if($People [$i].Peoplename -eq $ People [$j].Peoplename)
{
write-host "Duplicate item found" $People [$i].Peoplename
}
}
}
But I’m struggling on how to then check the Sitelabel
Would appreciate any advice / guidance
Upvotes: 1
Views: 789
Reputation: 32170
Note: This solution uses Cmdlets from the PowerShell Community Extensions module.
Your data do not match your expected output. Bill Jones, for example, doesn't have SL5, SL6, and SL7.
Still, here's what I'd do.
This is how I set up the data. I've no idea how you're getting it:
$Data = @'
"PeopleName","SiteLabel"
"Bill Jones","SL1, SL2, SL3"
"John Smith","SL1, SL2, SL4"
"Claire Harris","SL1, SL2, SL3"
"Fran Goddard","SL1, SL4, SL3"
"John Smith","SL1, SL2, SL3, SL4, SL5"
"Bill Jones","SL1, SL2, SL3"
"John Smith","SL1, SL2, SL3"
'@
$People = ConvertFrom-Csv $Data;
And this is how I manipulate it:
$PeopleCondensed = $People | Group-Object 'PeopleName' | ForEach-Object {
$_ | Select-Object @{n='PeopleName';e={$_.Name}}, @{n='SiteLabel';e={ `
$_.Group `
| Select-Object -ExpandProperty SiteLabel `
| Join-String -Separator ',' `
| Split-String -Separator ',' -RemoveEmptyStrings `
| ForEach-Object { $_.Trim(); } `
| Select-Object -Unique `
| Sort-Object `
| Out-String `
| Split-String -RemoveEmptyStrings `
| Join-String -Separator ', '; }};
}
$PeopleCondensed | Format-Table -AutoSize;
That gives me this:
PeopleName SiteLabel
---------- ---------
Bill Jones SL1, SL2, SL3
John Smith SL1, SL2, SL3, SL4, SL5
Claire Harris SL1, SL2, SL3
Fran Goddard SL1, SL3, SL4
It doesn't match your expected output, but that's because the data don't represent your expected output.
You'll note that I didn't use any for
loops in favor of a foreach
loops. This is best practice for PowerShell. Mucking about with iterator variables is a waste of time. I also make liberal use of pipelines. This is also best practice for PowerShell.
Upvotes: 1
Reputation: 46710
I am making the same assumptions about your data as Bacon Bits. Your data and question should not derive the expected output. I use Group-Object
and calculated properties to get the same result as Bacon Bits. I am also working on the assumption that your data is a csv where Sitelabel is a comma delimited string. $data
is that imported csv object.
$data | Select PeopleName,@{Name="Sitelabel";Expression={$_.Sitelabel.trim() -split "\s?,\s?"}} |
Group-Object Peoplename |
Select @{Name="Peoplename";Expression={$_.Name}}, @{Name='Sitelabel';Expression={($_.Group.Sitelabel | Sort-Object -Unique) -join ", "}
}
We take the Sitelabel
of each user and -split
it into an array. Then Group-Object
groups by people and then calls that persons Sitelabel
s. We sort all of those alphabetically and drop any duplicates. You can export that back to the original file easy now.
Upvotes: 0