n8.
n8.

Reputation: 1738

Use PowerShell to open multiple Excel workbooks in parallel

I have a script that opens a set of Excel workbooks in a single Excel instance and updates connections and pivot caches. The end user reports that they usually open all the workbooks and refresh connections in parallel (as opposed to "in series" as the script does). It looks like parallel processing is possible with PowerShell but so far I'm unable to get Excel Workbooks to open using the script block and Start-Job method. All that happens is that the Excel instance is launched and the job runs forever without completing and no workbooks open.

Best effort so far:

$filePath = "C:\testpath"

#Array: FileName, SaveAs
$files = @()
$files += , @("tst1.xlsx", $false)
$files += , @("tst2.xlsx", $false)

Write-output "$(Get-Date -Format yyyy-MM-dd.hh:mm:ss): Starting loop"
$x1 = New-Object -ComObject "Excel.Application"
$x1.Visible = $true

ForEach($file in $files) {
    
    # Define what each job does
    $ScriptBlock = {

        param([Object[]]$x1,
              [string]$filePath,
              [array]$file) 

        $wb = $x1.workbooks.Open("$filePath\$($file[0])")
            
        ForEach ($cache in $wb.PivotCaches()) {
            #Refresh this cache
            $cache.Refresh()
        }
    }
        
    # Execute the jobs in parallel
    Start-Job $ScriptBlock -ArgumentList $x1, 
                                         $filePath,
                                         $file
}

Get-Job
 
# Wait for it all to complete
While (Get-Job -State "Running") {
    # Pause for 10 seconds
    Start-Sleep 10
}
 
# Getting the information back from the jobs
Get-Job | Receive-Job

Write-output "$(Get-Date -Format yyyy-MM-dd.hh:mm:ss):Loop complete"

All of this works just fine without parallel processing:

$filePath = "C:\testpath"

#Array: FileName, SaveAs
$files = @()
$files += , @("tst1.xlsx", $false)
$files += , @("tst2.xlsx", $false)

Write-output "$(Get-Date -Format yyyy-MM-dd.hh:mm:ss): Starting loop"
$x1 = New-Object -ComObject "Excel.Application"
$x1.Visible = $true

ForEach($file in $files) {
    
    $wb = $x1.workbooks.Open("$filePath\$($file[0])")
            
    ForEach ($cache in $wb.PivotCaches()) {
        #Refresh this cache
        $cache.Refresh()
    }
}

Clues on how to get Excel to launch the workbook within the script block? I put a code break in on the $wb = $x1.workbooks.Open("$filePath\$($file[0])") and the code doesn't stop, it just runs forever. This is the output returned:

Id Name PSJobTypeName State HasMoreData Location Command
1 Job1 BackgroundJob Running True localhost ...
3 Job3 BackgroundJob Running True localhost ...
1 Job1 BackgroundJob Running True localhost ...
3 Job3 BackgroundJob Running True localhost ...

Upvotes: 0

Views: 106

Answers (2)

n8.
n8.

Reputation: 1738

Here's how I kind of solved it. The issue is that the connection refresh could take a little time or a lot but PowerShell doesn't know how long to wait so it requires a chair/keyboard meatbag to complete the run.

#Establish important date variable strings
$evalDate = (Get-Date).tostring("yyyyMMdd")

$filePath = "C:\testpath"

#Array: FileName, SaveAs (doesn't get used because don't know how long refresh takes)
$files = @()
$files += , @("tst1.xlsx", $false)
$files += , @("tst2.xlsx", $false)

Write-output "$(Get-Date -Format yyyy-MM-dd.hh:mm:ss): Starting loop"

ForEach($file in $files) {
    #Get last modification date (so can be skipped if already updated today)
    $dateModfd = (Get-Item "$filePath\$($file[0])").LastWriteTime.tostring("yyyyMMdd")

    #Check to see if the file to be updated hasn't yet been updated today
    if($dateModfd -lt $evalDate) {

        # Define what each job does
        $ScriptBlock = {

            param([string]$filePath,
                  [array]$file,
                  [string]$evalDate) 

            $x1 = New-Object -ComObject "Excel.Application"
            $x1.Visible = $true
            
            $wb = $x1.workbooks.Open("$filePath\$($file[0])")
            
            Write-output "$(Get-Date -Format yyyy-MM-dd.hh:mm:ss): Refreshing connections for $($file[0])"
            ForEach ($objConnection in $wb.Connections) {
        
                #Refresh this connection
                $objConnection.Refresh()
                
                #If connection is still refreshing (query is running) then wait 2 seconds and check again
                #While ($wb.ODBCConnection.Refreshing) - doesn't work
                #{
                    Start-Sleep -Seconds 600
                #}
            }
            
            ForEach ($cache in $wb.PivotCaches()) {
                #Refresh this cache
                $cache.Refresh()
            }

        }
        
        # Execute the jobs in parallel
        Start-Job $ScriptBlock -ArgumentList $filePath,
                                             $file,
                                             $evalDate
                          
        # Pause for 10 seconds so Excel can load
        Start-Sleep 10

    } else {
        Write-output "$(Get-Date -Format yyyy-MM-dd.hh:mm:ss): $($file[0].ToString()) skipped because it was already modified today: $dateModfd. `n (Today: $evalDate)"
    }
}

Upvotes: 1

Frank D.
Frank D.

Reputation: 36

I hope this is what your looking for:

#Define the Filepath
$filePath = "C:\testpath\"

#Define the Files
$files = @("tst1.xlsx", "tst2.xlsx")

#Clear the workbootPaths Array
$workbookPaths = @()

foreach ($file in $files) {
    #Create an PSobject that hold the filepath and the state (state is alway $false)
    $wbary = New-Object -TypeName psobject -Property @{
        Filepath = $filepath + $file
        State    = $false
    }
    #Add the PSObject to the $workbookpaths array
    $workbookPaths += $wbary
}

# Define the Job
$scriptBlock = {
    param(
        [string]$filePath,
        [string]$workbookPath
    )

    $x1 = New-Object -ComObject Excel.Application
    $x1.Visible = $true

    $wb = $x1.Workbooks.Open($workbookPath)

    foreach ($cache in $wb.PivotCaches()) {
        $cache.Refresh()
    }
}

Write-output "$(Get-Date -Format yyyy-MM-dd.hh:mm:ss): Starting loop"

#Than start the job for each file
ForEach ($file in $workbookPaths) {        
    Start-Job -ScriptBlock $scriptBlock -ArgumentList $filePath, $file.Filepath
}

Get-Job

# Wait for it all to complete
While (Get-Job -State "Running") {
    # Pause for 1 seconds
    Start-Sleep 1
}
 
# Getting the information back from the jobs
Get-Job | Receive-Job

Write-output "$(Get-Date -Format yyyy-MM-dd.hh:mm:ss): Loop complete"

Upvotes: 1

Related Questions