Assimilater
Assimilater

Reputation: 964

How the parameter `NumberOfConcurrentThreads` is used in `CreateIoCompletionPort`

Looking at the MSDN documentation for CreateIoCompletionPort we read:

NumberOfConcurrentThreads [in]

The maximum number of threads that the operating system can allow to concurrently process I/O completion packets for the I/O completion port. This parameter is ignored if the ExistingCompletionPort parameter is not NULL.

If this parameter is zero, the system allows as many concurrently running threads as there are processors in the system.

However, the documentation I find gives no indication that the IoCompletionPort actually creates any threads.

In fact, the example Microsoft provides uses the following code to determine how many processing cores are available (instead of passing 0 in for NumberOfConcurrentThreads) and then actually creates that many threads.

// The general value of the thread count is the system's processor count.
SYSTEM_INFO sysInfo = { 0 };
GetNativeSystemInfo(&sysInfo);
const DWORD dwThreadCount = sysInfo.dwNumberOfProcessors;

// A class in the example that wraps around IoCompletionPort
IOCompletionPort port; 

// Construct the thread pool
HANDLE* hThreads = new HANDLE[dwThreadCount];
for (DWORD i = 0; i < dwThreadCount; ++i) {
    // The threads run CompletionThread
    hThreads[i] = CreateThread(0, 0, IOCompletionThread, &port, 0, NULL);
}

This seems to me to indicate that somehow there is a "carrying capacity" associated with the IoCompletionPort. But how is this manifest? I have a hard time seeing how (or even why this would be desirable) a thread with access to the completion port would be prevent from dequeuing from the completion port.

In fact, I tried modifying the line that creates the threads to be new HANDLE[++dwThreadCount] (and removing the const specifier from the declaration) and the example seemed to execute without complaints. I just noticed one extra timeout error at the conclusion of execution.

My only current conclusion would be to say that NumberOfConcurrentThreads is a "dummy" variable with no practical usage, so what am I missing?

Upvotes: 2

Views: 1110

Answers (2)

RbMm
RbMm

Reputation: 33754

IOCP is based on the KQUEUE object:

struct KQUEUE {
    DISPATCHER_HEADER Header;
    LIST_ENTRY EntryListHead;
    ULONG CurrentCount;
    ULONG MaximumCount;
    LIST_ENTRY ThreadListHead;
};

MaximumCount is assigned by KeNumberProcessors if initialized by KeInitializeQueue (with Count == 0) or NumberOfConcurrentThreads if initialized from CreateIoCompletionPort (ZwCreateIoCompletion).

CurrentCount is the number of 'active' (not waited) threads, bound to this KQUEUE (in the ETHREAD structure there is a special field: KQUEUE* Queue).

If a thread tries to remove a packet from KQUEUE by calling KeRemoveQueue or ZwRemoveIoCompletion (GetQueuedCompletionStatus) and no packets are in the IOCP (EntryListHead is empty), then of course the thread will enter a wait state.

But if a packets exists, the system looks at CurrentCount and MaximumCount. If CurrentCount < MaximumCount, a packet will be removed (and CurrentCount++ incremented). Otherwise, the thread will enter a wait state.

If a thread inserts a new packet to an IOCP which other threads were previously waiting on, one will be awoken (in LIFO order) only if (CurrentCount < MaximumCount).

When:

  • A thread begins waiting on some object (by KeWaitForObject)
  • A thread is suspended (this is also call to KeWaitForObject internally)
  • KeDelayExecution (Sleep) is called

the system looks at Thread->Queue and if it's not 0, CurrentCount-- is decremented. Additionally, If packets exist in the IOCP, threads are waiting on it, and CurrentCount < MaximumCount then one thread will be awoken.

So the logic is actually quite complex, but the main point is not more than MaximumCount threads will be able to process packets from an IOCP at once.

Usually, the best value for this is KeNumberProcessors, but (in some special cases) profiling tools may help you decide on a different value that is more preferable for your situation.

Upvotes: 2

Remy Lebeau
Remy Lebeau

Reputation: 597285

The I/O Completion Port itself does not create threads. The NumberOfConcurrentThreads parameter specifies how many threads are allowed to process completion packets in parallel at the same time. This is explained in more detailed on another MSDN page:

I/O Completion Ports

How I/O Completion Ports Work

...

Although any number of threads can call GetQueuedCompletionStatus for a specified I/O completion port, when a specified thread calls GetQueuedCompletionStatus the first time, it becomes associated with the specified I/O completion port until one of three things occurs: The thread exits, specifies a different I/O completion port, or closes the I/O completion port. In other words, a single thread can be associated with, at most, one I/O completion port.

When a completion packet is queued to an I/O completion port, the system first checks how many threads associated with that port are running. If the number of threads running is less than the concurrency value (discussed in the next section), one of the waiting threads (the most recent one) is allowed to process the completion packet. When a running thread completes its processing, it typically calls GetQueuedCompletionStatus again, at which point it either returns with the next completion packet or waits if the queue is empty.

...

Threads and Concurrency

The most important property of an I/O completion port to consider carefully is the concurrency value. The concurrency value of a completion port is specified when it is created with CreateIoCompletionPort via the NumberOfConcurrentThreads parameter. This value limits the number of runnable threads associated with the completion port. When the total number of runnable threads associated with the completion port reaches the concurrency value, the system blocks the execution of any subsequent threads associated with that completion port until the number of runnable threads drops below the concurrency value.

The most efficient scenario occurs when there are completion packets waiting in the queue, but no waits can be satisfied because the port has reached its concurrency limit. Consider what happens with a concurrency value of one and multiple threads waiting in the GetQueuedCompletionStatus function call. In this case, if the queue always has completion packets waiting, when the running thread calls GetQueuedCompletionStatus, it will not block execution because, as mentioned earlier, the thread queue is LIFO. Instead, this thread will immediately pick up the next queued completion packet. No thread context switches will occur, because the running thread is continually picking up completion packets and the other threads are unable to run.

...

The best overall maximum value to pick for the concurrency value is the number of CPUs on the computer. If your transaction required a lengthy computation, a larger concurrency value will allow more threads to run. Each completion packet may take longer to finish, but more completion packets will be processed at the same time. You can experiment with the concurrency value in conjunction with profiling tools to achieve the best effect for your application.

The system also allows a thread waiting in GetQueuedCompletionStatus to process a completion packet if another running thread associated with the same I/O completion port enters a wait state for other reasons, for example the SuspendThread function. When the thread in the wait state begins running again, there may be a brief period when the number of active threads exceeds the concurrency value. However, the system quickly reduces this number by not allowing any new active threads until the number of active threads falls below the concurrency value. This is one reason to have your application create more threads in its thread pool than the concurrency value. Thread pool management is beyond the scope of this topic, but a good rule of thumb is to have a minimum of twice as many threads in the thread pool as there are processors on the system. For additional information about thread pooling, see Thread Pools.

Upvotes: 3

Related Questions