Reputation: 63816
Tasks (the TPL) are the recommended way of performing parallelism since .NET 4.0 as they are a higher abstraction and let the runtime optimise things better.
But in the scenario where all work units must happen at once, is TPL still an/the best option?
My use-case is to spawn multiple instances (approximately 10) of PsExec in order to simultaneously run the same process on multiple remote PCs, and wait for each instance to exit. Any 'optimization' by the TPL which resulted in not running all instances in parallel would be disastrous.
Does this use-case fall outside the scope of TPL and I'd be better just launching threads?
I'm aware that you cannot execute more threads at once than you have cores but Windows will run more threads than cores by slicing, that is acceptable but scheduling any thread not to run until others have completed is not.
Upvotes: 1
Views: 267
Reputation: 131722
The TPL's behavior isn't really relevant to your scenario - you don't need the TPL to spawn X command-line processes in parallel, you can do it with a simple for
loop. Process.Start
doesn't wait for the process to terminate and returns as soon as the process is spawned.
The time it takes for psexec
to connect to a remote machine and spawn a process there is so large that you'll be able to spawn dozens if not hundreds of processes before the first remote machine start to process the request.
If you absolutely must start thousands of processes and the few milliseconds delay of a for loop won't do, you can use Task.Run(()=>Process.Start...)
to spawn multiple processes in parallel. You'd have to collect the Process objects
returned by all Task.Run
calls in order to monitor them for completion.
Spawning a process though is a lot more expensive than making the network call directly from your code. You can create remote sessions eg. as shown here and execute pipelines (commands) remotely.
You can use InvokeAsync instead of Invoke to start executing each pipeline asynchronously, either in a for loop or using the TPL. To detect if a command has finished, you need to monitor the pipeline's PipelineStateInfo property or subscribe to its StateChanged event.
You can use a TaskCompletionSource to wrap the event and wait on all pipelines for completion.
EDIT
Perhaps a better option would be to schedule jobs on the remote computers to run at a specific point in time, by executing Start-Job, rather than trying spawn all the processes at the same time. This avoids a lot of orchestration headaches.
Yet another option is to have Powershell itself execute the commands in parallel using Powershell workflows. Workflows also allow you to execute the same script on all items in a collection in parallel.
EDIT 2
Seems Powershell workflows already support spawning scripts on multiple computers simply by using the PSComputerName parameter. Copied from the docs:
The following commands run the Test-Workflow workflow on hundreds
of computers. The first command gets the computer names from a text
files and saves them in the $Servers variable on the local computer.
The second command uses the Using scope modifier to indicate that
the $Servers variable is defined in the local session.
PS C:\> $Servers = Get-Content Servers.txt
PS C:\> Invoke-Command -Session $ws {Test-Workflow -PSComputerName $Using:Servers }
Upvotes: 1
Reputation: 171246
It is at the discretion of the TPL when it starts or tasks code on what thread. If the thread-pool happens to be slow to inject new threads at the moment your tasks can be delayed by many seconds.
By using TaskCreationOptions.LongRunning
you can make the current versions of the TPL create a new thread for that task immediately. Clearly, you still don't have any guarantees regarding simultaneous execution but it seems approximate simultaneous execution is enough for you.
In my estimation TaskCreationOptions.LongRunning
is now guaranteed to create a new thread in future versions as well for reasons of compatibility. Apps surely have come to rely on various details such as thread ids and thread-local state. This can never be changed (given the history of high-compatibility releases that .NET has).
You should prefer a TaskCreationOptions.LongRunning
Task
over a Thread
because it composes better with other code and has nicer error handling.
Upvotes: 1
Reputation: 13077
From documentation :
The TPL scales the degree of concurrency dynamically to most efficiently use all the processors that are available
but your scenario,
"all work units must happen at once"
So there it is, there is no guarantee that all process are parallelized. The amount of paralalization will depend on the amount of resources you have (in this case processors/threads) . And even with resources, success will be hinder by the amount of units you need to parallelize.
Additionally :
TPL uses threadpools, that means your work is queued to a thread in the threadpoool. But you state
"but scheduling any thread not to run until others have completed is not"
This could get violate when you have work units more than the available number of threads in threadpool.
Actually my opinion is handling basic threads, would be appropriate for a such a delicate/sensitive task.
Upvotes: 1