Reputation: 751

Relinquish Processor Time in C++ (Windows)

I've looked around a fair amount and can't seem to find what I'm looking for, but let me first stress that I'm not looking for a high-precision sleep function.

Here's the background for the problem I'm trying to solve:

I've made a memory mapping library that operates a lot like a named pipe. You can put bytes into it, get bytes out of it, and query how many bytes are available to read/write, all that good stuff.

It's fast (mostly) processes communicating using it will average at 4GB/s if they're passing chunks of bytes 8KBs or larger. Performance goes down to around 300MB/s as you approach 512B chunk size.

The problem:

Very occasionally, on heavily loaded servers, very large lag times will occur (Upwards of 5s). My running theory for the cause of this issue is that when large transfers are taking place (larger than the size of the mapped memory), the process that's writing data will tight poll to wait for more space to be available in the circular buffer that's implemented on top of the memory map. There are no calls to sleep, so the polling process could be hogging the CPU for no good reason! The issue is that even the smallest call to sleep (1ms) would absolutely demolish performance. The memmap size is 16KB, so if it slept for 1ms every 16KB, performance would drop to a best-case scenario of 16MB/s.

The solution:

I want a function that I can call that will relinquish the CPU, but makes no limitations on when it gets rescheduled by the operating system (Windows 7 in this case).

Has anyone got any bright alternatives?/Does anyone know if such a function exists?

Thanks.

Upvotes: 2

Answers (6)

frr

Reputation: 446

I'm aware that this question is old - but, I'd like to add my 2c worth even after all those years, for the benefit of people coming after me...

How does sleep() yield the processor

If I take the headline at face value, that's exactly what I've come looking for. How to give up the CPU to the Windows OS process scheduler, voluntarily, telling it to "get some other jobs done and come back when suitable". In my case, the "assignment" was: see if you can detect anomalous latencies in the system, that do not correlate to the CPU load as reported in the task manager and performance counters. The suspicion being, that something in the system is hiding its CPU consumption, or is making the CPU to sleep extensively. As of this writing, this thread already contains a rich set of answers, pointers to further documentation etc. = showing to me several options, how to properly "yield" the CPU.

In my particular case, I have written a small proggie, that does the following: in a tightish inner loop, it keeps Sleep()ing. And, it asks the OS for a timestamp, before and after the Sleep(), to check how long it was in fact "gone". BTW I'm using the GetSystemTimeAsFileTime() which returns a timestamp with a granularity of 100 nano seconds. The inner loop just keeps collecting stats, that then get reported once a second by an outer loop (well actually just an if() block in the fast inner loop - you get the idea).

To get back on topic here, I have found out the actual behavior of WinAPI Sleep() for this purpose, as of Windows 10.

Sleep(0) does not wait until the next "process scheduler timer tick". Instead, it returns sooner - apparently, as soon as the scheduler has no other work in the queue at the same or higher priorty. In the stats produced by my proggie, I can see the average latency of Sleep(0) of about 180 ns. That's including my own code inside the tight loop. (Only the reporting printf() is eliminated from the timing measurements.)

Sleep(1) on the other hand, does wait for the next timer-based scheduler pass. I.e. it does not attempt to wait for 1 ms exactly, or even almost. It waits for 1 ms at the very least, and possibly a lot longer. In my observation, the average latency is 15-16 ms. Which is consistent with the classic Windows default timer interrupt period of 15.625 ms (64 Hz) dating back to the i8254 PC AT timer.

I could probably make the scheduler timer tick faster, but that's not my point here... Just for the record, in a rich but old archived NNTP thread I have learned about a function called timeBeginPeriod() (the link goes to current MS docs), which back in the day used to tweak the system-global scheduler timer tick, but since Windows 10 allegedly it does no longer do that, and there has been a further update under Windows 11... Overall, Microsoft have made the timer granularity steering more fine-grained, per process, also depending on the state of the window belonging to the process.

There is another WinAPI function (or macro) related to this, called YieldProcessor() (the link goes to MS Docs). I haven't tried this one, but I'd expect it to behave similar to Sleep(0). The verb "yield" is characteristic for this action in the meme context of process scheduling in general. For instance in Linux, the corresponding user-space function is called sched_yield(). In that context, I find slightly suspicious the description in Microsoft docs at the link above, let me quote:

Signals to the processor to give resources to threads that are waiting for them. This macro is only effective on processors that support technology allowing multiple threads running on a single processor, such as Intel's Hyperthreading technology.

I suspect that this description is wrong. Multi-threading works in a preemptive scheduler on a single processor core too, even without HT support. Or on a multi-core CPU without HT. Multi-threading is the job and discretion and feature of the process scheduler. A thread is just a lighter-weight process. Multiple threads in a process AKA task can share the same virtual memory space for instance. For most practical purposes, SMT/HT is mercifully obscured from view of the process scheduler - it is something happening in the hardware, sharing some deeper compute resources between "front end cores", the implementation / level of HW sharing varies by CPU vendor and model... and SMT/HT is not a requirement for multi-threading to work on the CPU. Actually, on a multi-core CPU (with or without SMT/HT), yielding can have less of a practical effect, if the concurrent threads / processes are running on different cores :-)

Side note: in Linux, you can investigate this topic all the way to a function in the kernel that's called schedule() . The link goes to a topic here at StackOverflow where I have dropped an answer...

Inter-process synchronization

And, a few notes on inter-process synchronization, which seems to be the actual topic that the OP has asked about, all those years ago: there are functions/objects/mechanisms for exactly that purpose.

In Windows, there is WaitForSingleObject() and friends. Collectively, they're called the SynchAPI. (The pointers lead to Microsoft Documentation.) And, there appears to be a neat tutorial chapter at learn.microsoft.com with code examples.

To be honest, I don't know the Microsoft side of IPC and thread synchronization very much. I have cut my teeth on the POSIX threads and locking API (in Linux): semaphores, mutexes, condition variables. I have learned about them from historical material that is no longer available, and wasn't exactly stellar at the time (it required me to combine several sources). For the moment, let me suggest a google query. Seems to return a cartload of plausible tutorials.

The traditional threading-101 wisdom would suggest, that for producer-consumer problems, the right synchronization primitive would be a semaphore. This common wisdom is IMO oversimplified. Instead, the one specific tool that I suggest anyone to look at, are condition variables. In the POSIX world, they're intended to be used in combination with a mutex. They're the perfect primitive to protect a work queue, even in a broader sense of the word. In a broader class of producer-consumer problems, with various N-ary combinations of producers vs. consumers: N:1, 1:M, M:N. The mutex is there to guarantee atomic manipulation with the shared data object (by mutual exclusivity) and the condition variable allows consumers to wait for some work to come, without busy-waiting in a tight loop. Details can be tailored to fit the assignment. The sweet secret to me is a neat trick, how the pthread_cond_wait() wakes up with the mutex conveniently already locked by the thread that just got woken up :-) If you have multiple consumers waiting on the queue, you can use pthread_cond_broadcast() instead of pthread_cond_signal() etc. Really depends on your needs.

I also recall some minor pitfall with "which thread wakes up first", the one signaling or the one that's been woken up... potentially leading to a race condition and a deadlock, depending on what your surrounding code does. Not sure anymore if this was in the user space (libpthread condition variables), or a similar primitive in the kernel... it was years ago. The WinAPI synchapi primitives may seem similar, but behave differently in some details. This whole topic is just beautiful.

Note that in the recent decade or so, the POSIX idea of threading including the sync primitives (mutexes, condition variables etc) have crept into C++. They are now part of the C++ standard library! Which means, that it's up to the compiler vendor, to wrap OS-specific sync primitives in such a way that the one C++ way works across OS platforms. Interesting stuff - another step, turning C++ into a slightly higher-level language, compared to the "platform-independent assembler on steroids" (C PreProcessor) where it all started...

All of this "thread synchronization" stuff admittedly works only within one process, among its lightweight threads of execution. For proper inter-process synchronization, you will need some heavier-weight OS-specific mechanisms. For instance, POSIX has named semaphores which can be used between processes - probably a companion mechanism to SHM.

Upvotes: 0

MSalters

Reputation: 180295

You're incorrectly assuming a binary choice. You now are always busy-waiting because sleep always would be a bad idea.

The better solution is to try a few times without sleeping. If that still fails (because the map is full, and the other thread isn't running), then you can issue a true sleep. This will be sufficiently rare that on average you'll be sleeping microseconds. You could even check the realtime clock (RDTSC) to determine how long you've spent busy-waiting before surrendering your timeslice.

Upvotes: 1

user541686

Reputation: 210765

You need the SwitchToThread() function (which will only relinquish its time slice if something else can run), not Sleep(0) (which would relinquish its time slice even if nothing else can run).

If you're writing code that's designed to take advantage of hyperthreading, YieldProcessor might do something for you too, but I doubt that'll be helpful.

Upvotes: 1

paxdiablo

Reputation: 882686

If you're operating under .Net, you can look into the Thread::Yield() method.

It may or may not help with your specific scenario but it's the correct way notify the scheduler that you want to relinquish the remainder of your timeslice.

If you're running in a pre-.Net environment (seems unlikely if you're on Windows 7), you can look into the Win32 SwitchToThread() function instead.

Upvotes: 0

Billy ONeal

Reputation: 106609

std::this_thread::yield() probably does what you want. I believe it just calls Sleep with 0 in most implementations.

Upvotes: 2

mkimball

Reputation: 784

According to the MSDN documentation, on XP or newer, when you call Sleep with a timeout of 0 will yield to other processes of equal priority.

A value of zero causes the thread to relinquish the remainder of its time slice to any other thread of equal priority that is ready to run. If there are no other threads of equal priority ready to run, the function returns immediately, and the thread continues execution.

http://msdn.microsoft.com/en-us/library/windows/desktop/ms686298(v=vs.85).aspx

Another option that will require more work but that will work more reliably would be to share an event handle between the producer and consumer process. You can use CreateEvent to create your event and DuplicateHandle to get it into your other process. As the producer fills the buffer, it will call ResetEvent on the event handle and call WaitForSingleObject with it. When the consumer has removed some data from the full shared buffer, it will call SetEvent, which will wake the producer which was waiting in WaitForSingleObject.

Upvotes: 3

Relinquish Processor Time in C++ (Windows)

Answers (6)

How does sleep() yield the processor

Inter-process synchronization

Related Questions