Fimlore
Fimlore

Reputation: 167

I need to query a (4GB) variable in multiple jobs - passing the variable along causes memory to overflow

Im currently trying to multithread the inventory of 5000 groups 30.000+ users; I took this information offline cause I don't want to query our service provider over the internet for each group membership; as the information is 'somewhat' available in a property when retrieving the groups - that aside.

So I have 2 cliXML files that I import into variables, memory usage on the server: 4GB+ using one ForEach loop to tie both arrays together takes roughly 30 seconds per group, so I with to use Jobs - but there comes my problem... each job would require the entire copy of both XML files to do their lookups, so each additional job causes the memory to be filled with 4GB extra..

I wanted to use something like databases (like sqlite) but the data properties of PowerShell get lost cause it doesnt support the rich object-oriented columns Powershell does...

Write-Info "Starting jobs..."
$Start = Get-Date
For ($i = 0; $i -lt $runs; $i++) {
    $currentBatch = $MailSecurityGroups | Select-Object -First $BatchSize -Skip $CurrentBatchStart
    $CurrentBatchStart += $BatchSize
    $CurrentBatchStart = 0

    #Limit Batches to 75 per time...
    while ((Get-Job | Where-object State -eq "Running").count -gt 5) {
        write-Info "Waiting to start next job..."
        $prev = Get-Status -prev $prev
        Start-Sleep -Seconds 10
    }

    start-sleep -seconds (get-random -Maximum 5 -Minimum 1)
    Write-Info "Starting job ($i)..."
    Start-Job { 
        param([array]$CurrentBatch)

        Start-Transcript "$($using:WorkPath)\GroupExport\BatchOutput_Batch$($using:i).log"

        Write-Host  $using:MailSecurityGroups.Count
        Write-Host  $using:MailEnabledUsers.Count

        Foreach ($Group in $CurrentBatch) {
            $groupGUID = $Group.GUID.Guid
            [Array]$GroupMembers = @()
            $CurrentGroupMembersExportCSV = "$($using:WorkPath)\GroupExport\$($groupGUID).csv" 

            Write-Host "Processing [$($groupGUID)] - $($Group.DisplayName)"

            foreach ($Member in $Group.Members) {
                Write-Host "Processing $($member)"

                $FoundGroup = ($using:MailSecurityGroups | Where-Object {$_.Name -eq $Member.User -or $_.Name -eq $Member.Name -or $_.Name -eq $Member} )
                if ($FoundGroup) {
                    $CurrentObject = $FoundGroup | Select-Object *, @{N="ObjectType";E={"Group"}}
                    Write-Host "Found a group!"
                } else {
                    $FoundUser = $using:MailEnabledUsers | Where-Object {$_.UserPrincipalName -eq $Member.User -or $_.Identity -eq $Member}
                    if ($FoundUser) {
                        $CurrentObject = $FoundUser | Select-Object *, @{N="ObjectType";E={"User"}}
                        Write-Host "Found a User!"
                    } else {
                        Write-Host "Found Nothing.."
                        continue
                    }
                }
                
                Write-Host "$($CurrentObject.ObjectType)"
                if ($CurrentObject.ObjectType -eq "User") {
                    Write-Host  "User: GUID: [$($CurrentObject.GUID.Guid)] - ($($CurrentObject.Name))"
                    $UserObject = [PSCustomObject]@{
                        PrimarySMTPAddress = $CurrentObject.UserPrincipalName
                        GUID = $CurrentObject.Guid.Guid
                        MemberIdentity = $Member
                    }
                    $GroupMembers += $UserObject
                } else {
                    #Kick it out of the query, we will look at this later.
                    $NestedIssue = "$($using:WorkPath)\GroupExport\NESTEDGROUP_$($groupGUID).Csv"
                    Write-Host  $NestedIssue
                    $CurrentObject | Export-Csv $NestedIssue -Append -NoTypeInformation -Delimiter ";"
                }

                $GroupMembers | Export-CSV $CurrentGroupMembersExportCSV -Delimiter ";" -NoTypeInformation
                $Group | Select-Object Guid |export-CSV "$($using:WorkPath)\GroupExport\zz_processed_Batch$($using:i).csv" -Append -Delimiter ";"
            }
        }

        Stop-Transcript
    } -ArgumentList (,$currentBatch) -Name "zz_Batch$($i)"
} 

as noted above, I tried to pass the Variables along with $using: and I also tried to import the XML into each job, but the issue is the size of them.. I require some sort of 'centrally queryable variable' that is stored in memory just once...

Upvotes: 1

Views: 71

Answers (2)

iRon
iRon

Reputation: 23830

As described in PowerShell scripting performance considerations, wrapping cmdlets as e.g. the Export-Csv cmdlet, might get pretty expensive. To avoid this, you might want to keep the Csv files open by creating multiple pipelines. Unfortunately, I can't completely simulate your environment but to achieve this, your script should look something like this:

$MailSecurityGroups | Foreach-Object -Begin {
    $CsvExports = @{}
    function ExportCsv($Path, $Object) {
        if(-not $CsvExports.Contains($Path)) { # Open a new pipeline (file)
            $CsvExports[$Path] = {
                Export-CSV -Path $Path -NoTypeInformation -Delimiter ";"
            }.GetSteppablePipeline()
            $CsvExports[$Path].Begin($true)
        }
        $CsvExports[$Path].Process($Object) # Export the object
    }
} -Process {
    $Group = $_
    Function Write-Info {
        param ($text)   
        Write-Host "[$(get-date)] $text"
    }
    Write-Info "Processing [$($Group.GUID.Guid)] - $($Group.DisplayName)"


    #Pass the variables along
    # $MailSecurityGroups = $using:MailSecurityGroups
    # $MailEnabledUsers = $using:MailEnabledUsers
    # $WorkPath = $using:WorkPath
    $CurrentGroupMembersExportCSV = "$($using:WorkPath)\GroupExport\$($Group.Guid.Guid).csv"

    $Group.Members | Foreach-Object {
        $Member = $_
        Function Write-Info {
            param ($text)   
            Write-Host "[$(get-date)] $text"
        }
        Function Get-ObjectType {
            param ($PermissionObject)
            $FoundGroup = ($MailSecurityGroups | Where-Object {$_.Name -eq $PermissionObject.User -or $_.Name -eq $PermissionObject.Name -or $_.Name -eq $PermissionObject} )
            if ($FoundGroup) {
                return $FoundGroup | Select-Object *, @{N="ObjectType";E={"Group"}}
            } 
            $FoundUser = $MailEnabledUsers | Where-Object {$_.UserPrincipalName -eq $PermissionObject.User -or $_.Identity -eq $PermissionObject}
            if ($FoundUser) {
                return $FoundUser | Select-Object *, @{N="ObjectType";E={"User"}}
            }
        }
        Write-Info "Processing $($Member)"
        
        $CurrentObject = Get-ObjectType $Member
        if ($CurrentObject.ObjectType -eq "User") {
            Write-Info "User: GUID: [$($CurrentObject.GUID)] - ($($CurrentObject.Name))"
            $UserObject = [PSCustomObject]@{
                PrimarySMTPAddress = $CurrentObject.UserPrincipalName
                GUID = $CurrentObject.Guid
                MemberIdentity = $Member
            }
            # $UserObject | Export-CSV $using:CurrentGroupMembersExportCSV -Delimiter ";" -NoTypeInformation -Append
            ExportCsv -Path $CurrentGroupMembersExportCSV -Object $UserObject
        } else { $CurrentGroupMembersExportCSV
            $NestedIssue = "$WorkPath\GroupExport\NESTEDGROUP_$($Group.GUID.Guid).Csv"
            Write-Host  $NestedIssue
            # $CurrentObject | Export-Csv $NestedIssue -Append -NoTypeInformation -Delimiter ";"
            ExportCsv -Path $NestedIssue -Object $CurrentObject
        }
    }
} -End {
    $CsvExports.Values.foreach{ $_.End() } # Close all pipelines (files)
}

For more background, see: Mastering the (steppable) pipeline

Upvotes: 0

Fimlore
Fimlore

Reputation: 167

Based on the comment of @mklement0 I've updated my code as below, using the v7+ ForEach-Object -Parallel feature instead of background jobs to achieve parallelism:

$MailSecurityGroups | Foreach-Object -Parallel {
    $Group = $_
    Function Write-Info {
        param ($text)   
        Write-Host "[$(get-date)] $text"
    }
    Write-Info "Processing [$($Group.GUID.Guid)] - $($Group.DisplayName)"


    #Pass the variables along
    $MailSecurityGroups = $using:MailSecurityGroups
    $MailEnabledUsers = $using:MailEnabledUsers
    $WorkPath = $using:WorkPath
    $CurrentGroupMembersExportCSV = "$($using:WorkPath)\GroupExport\$($Group.Guid.Guid).csv"

    $Group.Members | Foreach-Object -Parallel {
        $Member = $_
        Function Write-Info {
            param ($text)   
            Write-Host "[$(get-date)] $text"
        }
        Function Get-ObjectType {
            param ($PermissionObject)
            $FoundGroup = ($using:MailSecurityGroups | Where-Object {$_.Name -eq $PermissionObject.User -or $_.Name -eq $PermissionObject.Name -or $_.Name -eq $PermissionObject} )
            if ($FoundGroup) {
                return $FoundGroup | Select-Object *, @{N="ObjectType";E={"Group"}}
            } 
            $FoundUser = $using:MailEnabledUsers | Where-Object {$_.UserPrincipalName -eq $PermissionObject.User -or $_.Identity -eq $PermissionObject}
            if ($FoundUser) {
                return $FoundUser | Select-Object *, @{N="ObjectType";E={"User"}}
            }
        }
        Write-Info "Processing $($Member)"
        
        $CurrentObject = Get-ObjectType $Member
        if ($CurrentObject.ObjectType -eq "User") {
            Write-Info "User: GUID: [$($CurrentObject.GUID)] - ($($CurrentObject.Name))"
            $UserObject = [PSCustomObject]@{
                PrimarySMTPAddress = $CurrentObject.UserPrincipalName
                GUID = $CurrentObject.Guid
                MemberIdentity = $Member
            }
            $UserObject | Export-CSV $using:CurrentGroupMembersExportCSV -Delimiter ";" -NoTypeInformation -Append
        } else {
            $NestedIssue = "$($using:WorkPath)\GroupExport\NESTEDGROUP_$($Group.GUID.Guid).Csv"
            Write-Host  $NestedIssue
            $CurrentObject | Export-Csv $NestedIssue -Append -NoTypeInformation -Delimiter ";"
        }
    } -ThrottleLimit 5
} -ThrottleLimit 10

needs some playing with the throttleLimits, but it seems way faster and no additional memory is being consumed!

Upvotes: 1

Related Questions