Johan Råde
Johan Råde

Reputation: 21418

Visual Studio C++ compiler options: Why does /O2 define /Gs?

The Visual Studio C++ compiler option /O2 (maximize speed) is equivalent to

/Og /Oi /Ot /Oy /Ob2 /Gs /GF /Gy

Why /Gs? How does it help maximize speed? (Note that it is /Gs, not /GS.)

Upvotes: 16

Views: 1852

Answers (2)

Glen Knowles
Glen Knowles

Reputation: 608

/O2 doesn't set /Gs, it's an error in the documentation.

Some experimentation (it's easy to see the __chkstk calls in the generated assembly) shows that:

/Gs (with no number) is equivalent to /Gs0 and means always insert __chkstk calls. And indeed, the MSDN says the same:

If the /Gs option is specified without a size argument, it is the same as specifying /Gs0,

/O2 does not set /Gs (aka /Gs0), there's a clear difference between "/O2" and "/O2 /Gs". Although it's possible it changes the default to something other than the page it seems more likely that this is just a documentation bug.

Stack probes are never good for performance, it only has a job to do when the stack is advancing to a new high water mark and is wasted cycles the rest of the time. This means that if you have a loop that calls a function 100 times, that functions stack probe might grow the stack the first time, but other 99 times it doesn't change anything because the stack was already grown the first time - if it needed to be grown at all.

Upvotes: 2

MSalters
MSalters

Reputation: 180225

/Gs will insert stack probes in functions that use more than one page (4kB typically) of local variables. A stack probe signals to the OS that you'll use a lot of stack space. If this probe hits the guard page, the OS knows that it will need to allocate an extra page of RAM for the stack to grow.

This is an optimization, because without the probe the actual memory access would trigger the RAM allocation, and the function would stall until the RAM is allocated. The probe, as part of the function prolog, doesn't cause a stall (at least not as severe).

[edit] Another benefit is that a stack probe up front will allocate memory once. If you need 16 KB of stack space and rely on allocation-on-demand, you'll have 4 page faults that each grow the stack by 4 KB. A single probe can replace these 4 faults with one syscall.

Upvotes: 13

Related Questions