user3716527
user3716527

Reputation: 21

Powershell duplicate data and combine

I’m trying to work with some duplicate data in Powershell and combine, I have an array with two fields Peoplename and Sitelabel the data looks like this

Peoplename      Sitelabel
Bill Jones      SL1, SL2, SL3
John Smith      SL1, SL2, SL4
Claire Harris   SL1, SL2, SL3           
Fran Goddard    SL1, SL4, SL3           
John Smith      SL1, SL2, SL3, SL4, SL5
Bill Jones      SL1, SL2, SL3
John Smith      SL1, SL2, SL3

I would like to end up with the following data

People          Sitelabel
Bill Jones      SL1, SL2, SL3, SL5, SL6, SL7
John Smith      SL1, SL2, SL4, SL5
Claire Harris   SL1, SL2, SL3           
Fran Goddard    SL1, SL4, SL3           

I have used the following code to identify the duplicate People

for([int] $i=0;$i -lt $People.Peoplename.count;$i++)
{
     for([int] $j=$i+1;$j -lt $ People.Peoplename.count;$j++)
     {
        if($People [$i].Peoplename -eq $ People [$j].Peoplename)
        {
             write-host "Duplicate item found" $People [$i].Peoplename
        }

    }
} 

But I’m struggling on how to then check the Sitelabel

Would appreciate any advice / guidance

Upvotes: 1

Views: 789

Answers (2)

Bacon Bits
Bacon Bits

Reputation: 32170

Note: This solution uses Cmdlets from the PowerShell Community Extensions module.

Your data do not match your expected output. Bill Jones, for example, doesn't have SL5, SL6, and SL7.

Still, here's what I'd do.

This is how I set up the data. I've no idea how you're getting it:

$Data = @'
"PeopleName","SiteLabel"
"Bill Jones","SL1, SL2, SL3"
"John Smith","SL1, SL2, SL4"
"Claire Harris","SL1, SL2, SL3"
"Fran Goddard","SL1, SL4, SL3"
"John Smith","SL1, SL2, SL3, SL4, SL5"
"Bill Jones","SL1, SL2, SL3"
"John Smith","SL1, SL2, SL3"
'@

$People = ConvertFrom-Csv $Data;

And this is how I manipulate it:

$PeopleCondensed = $People | Group-Object 'PeopleName' | ForEach-Object {
    $_ | Select-Object @{n='PeopleName';e={$_.Name}}, @{n='SiteLabel';e={ `
        $_.Group `
            | Select-Object -ExpandProperty SiteLabel `
            | Join-String -Separator ',' `
            | Split-String -Separator ',' -RemoveEmptyStrings `
            | ForEach-Object { $_.Trim(); } `
            | Select-Object -Unique `
            | Sort-Object `
            | Out-String `
            | Split-String -RemoveEmptyStrings `
            | Join-String -Separator ', '; }};

}

$PeopleCondensed | Format-Table -AutoSize;

That gives me this:

PeopleName    SiteLabel              
----------    ---------              
Bill Jones    SL1, SL2, SL3          
John Smith    SL1, SL2, SL3, SL4, SL5
Claire Harris SL1, SL2, SL3          
Fran Goddard  SL1, SL3, SL4          

It doesn't match your expected output, but that's because the data don't represent your expected output.

You'll note that I didn't use any for loops in favor of a foreach loops. This is best practice for PowerShell. Mucking about with iterator variables is a waste of time. I also make liberal use of pipelines. This is also best practice for PowerShell.

Upvotes: 1

Matt
Matt

Reputation: 46710

I am making the same assumptions about your data as Bacon Bits. Your data and question should not derive the expected output. I use Group-Object and calculated properties to get the same result as Bacon Bits. I am also working on the assumption that your data is a csv where Sitelabel is a comma delimited string. $data is that imported csv object.

$data | Select PeopleName,@{Name="Sitelabel";Expression={$_.Sitelabel.trim() -split "\s?,\s?"}} |
    Group-Object Peoplename | 
    Select @{Name="Peoplename";Expression={$_.Name}}, @{Name='Sitelabel';Expression={($_.Group.Sitelabel | Sort-Object -Unique) -join ", "}
}

We take the Sitelabel of each user and -split it into an array. Then Group-Object groups by people and then calls that persons Sitelabels. We sort all of those alphabetically and drop any duplicates. You can export that back to the original file easy now.

Upvotes: 0

Related Questions