How does Linux extract the sixth parameter of syscall?

Question

In 32 bit Intel architecture, the mmap2 system call has 6 parameters. The sixth parameter is stored in the ebp register. However, right before entering the kernel via sysenter, this happens (in linux-gate.so.1, the page of code mapped into user processes by the kernel):

push %ebp
movl %esp, %ebp
sysenter

This means that ebp should now have the stack pointer's contents in it instead of the sixth parameter. How does Linux get the parameter right?

Peter Cordes · Accepted Answer

That blog post you linked in comments has a link to Linus's post, which gave me the clue to the answer:

Which means that now the kernel can happily trash %ebp as part of the sixth argument setup, since system call restarting will re-initialize it to point to the user-level stack that we need in %ebp because otherwise it gets totally lost.

I'm a disgusting pig, and proud of it to boot.

-- Linus Torvalds

It turns out sysenter is designed to require user-space to cooperate with the kernel in saving the return address and user-space stack pointer. (Upon entering the kernel, %esp will be the kernel stack.) It does way less stuff than int 0x80, which is why it's way faster.

After entry into the kernel, the kernel has user-space's %esp value in %ebp, which it needs anyway. It accesses the 6th param from the user-space stack memory, along with the return address for SYSEXIT. Immediately after entry, (%ebp) holds the 6th syscall param. (Matching the standard int 0x80 ABI where user-space puts the 6th parameter there directly.)

From Michael's comment: "Here's the 32-bit sysenter_target code: look at the part starting at line 417"

From Intel's instruction reference manual entry for SYSENTER (links in the x86 wiki):

The SYSENTER and SYSEXIT instructions are companion instructions, but they do not constitute a call/return pair. When executing a SYSENTER instruction, the processor does not save state information for the user code (e.g., the instruction pointer), and neither the SYSENTER nor the SYSEXIT instruction supports passing parameters on the stack. To use the SYSENTER and SYSEXIT instructions as companion instructions for transitions between privilege level 3 code and privilege level 0 operating system procedures, the following conventions must be followed:

The segment descriptors for the privilege level 0 code and stack segments and for the privilege level 3 code and stack segments must be contiguous in a descriptor table. This convention allows the processor to compute the segment selectors from the value entered in the SYSENTER_CS_MSR MSR.

The fast system call “stub” routines executed by user code (typically in shared libraries or DLLs) must save the required return IP and processor state information if a return to the calling procedure is required. Likewise, the operating system or executive procedures called with SYSENTER instructions must have access to and use this saved return and state information when returning to the user code.

How does Linux extract the sixth parameter of syscall?

Answers (1)

Related Questions