Reputation: 31
I am currently following along with this tutorial, but I'm not a student of that school.
GDB gives me a segmentation fault in thread_start
on the line:
movq %rsp, (%rdi) # save sp in old thread's tcb
Here's additional info when I backtrace:
#0 thread_start () at thread_start.s:16
#1 0x0000000180219e83 in _cygtls::remove(unsigned int)::__PRETTY_FUNCTION__
() from /usr/bin/cygwin1.dll
#2 0x00000000ffffcc6b in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Being a newbie, I can't for my life figure out why. Here is my main file:
#define STACK_SIZE 1024*1024
//Thread TCB
struct thread {
unsigned char * stack_pointer;
void(*initial_function)(void *);
void * initial_argument;
};
struct thread * current_thread;
struct thread * inactive_thread;
void thread_switch(struct thread * old_t, struct thread * new_t);
void thread_start(struct thread * old_t, struct thread * new_t);
void yield() {
//swap threads
struct thread * temp = current_thread;
current_thread = inactive_thread;
inactive_thread = temp;
thread_switch(inactive_thread, current_thread);
}
void thread_wrap() {
// call the thread's function
current_thread->initial_function(current_thread->initial_argument);
yield();
}
int factorial(int n) {
return n == 0 ? 1 : n * factorial(n - 1);
}
// calls and print the factorial
void fun_with_threads(void * arg) {
int n = *(int*)arg;
printf("%d! = %d\n", n, factorial(n));
}
int main() {
//allocate memory for threads
inactive_thread = (struct thread*) malloc(sizeof(struct thread));
current_thread = (struct thread*) malloc(sizeof(struct thread));
// argument for factorial
int *p= (int *) malloc(sizeof(int));
*p = 5;
// intialise thread
current_thread->initial_argument = p;
current_thread->initial_function = fun_with_threads;
current_thread->stack_pointer = ((unsigned char*) malloc(STACK_SIZE)) + STACK_SIZE;
thread_start(inactive_thread, current_thread);
return 0;
}
Here's my asm code for thread_start
# Inline comment
/* Block comment */
# void thread_switch(struct thread * old_t, struct thread * new_t);
.globl thread_start
thread_start:
pushq %rbx # callee-save
pushq %rbp # callee-save
pushq %r12 # callee-save
pushq %r13 # callee-save
pushq %r14 # callee-save
pushq %r15 # callee-save
movq %rsp, (%rdi) # save sp in old thread's tcb
movq (%rsi), %rsp # load sp from new thread
jmp thread_wrap
and thread_switch:
# Inline comment
/* Block comment */
# void thread_switch(struct thread * old_t, struct thread * new_t);
.globl thread_switch
thread_switch:
pushq %rbx # callee-save
pushq %rbp # callee-save
pushq %r12 # callee-save
pushq %r13 # callee-save
pushq %r14 # callee-save
pushq %r15 # callee-save
movq %rsp, (%rdi) # save sp in old thread's tcb
movq (%rsi), %rsp # load sp from new thread
popq %r15 # callee-restore
popq %r14 # callee-restore
popq %r13 # callee-restore
popq %r12 # callee-restore
popq %rbp # callee-restore
popq %rbx # callee-restore
ret # return
Upvotes: 3
Views: 570
Reputation: 364068
You're on cygwin, right? It uses the Windows x64 calling convention by default, not the System V x86-64 psABI. So your args aren't in %rdi
and %rsi
.
The calling convention is Windows x64, but the ABI is slightly different: long
is 64 bit, so it's LP64 not LLP64. See the cygwin docs.
You could override the default with __attribute__((sysv_abi))
on the prototype, but that only works for compilers that understand GNU C.
Agner Fog's calling convention guide has some suggestions on how to write source code that assembles to working functions on Windows vs. non-Windows. The most straightforward thing is to use an #ifdef
to choose different function prologues.
This Intel intro to x64 assembly is somewhat Windows-centric, and details the Windows x64 __fastcall
calling convention.
(It's followed by examples and stuff. It's a pretty big and good tutorial that starts from very basic stuff, including how to use tools like an assembler. I'd recommend it for learning x86-64 asm in a Windows dev environment, and maybe in general.)
Windows x64
__fastcall
(like x64__vectorcall
but doesn't pass vectors in vector regs)
- RCX, RDX, R8, R9 are used for integer and pointer arguments in that order left to right
- XMM0, 1, 2, and 3 are used for floating point arguments.
- Additional arguments are pushed on the stack left to right.
- Parameters less than 64 bits long are not zero extended; the high bits contain garbage.
- It is the caller's responsibility to allocate 32 bytes of "shadow space" (for storing RCX, RDX, R8, and R9 if needed) before calling the function.
- It is the caller's responsibility to clean the stack after the call.
- Integer return values (similar to x86) are returned in RAX if 64 bits or less.
- Floating point return values are returned in XMM0.
- Larger return values (structs) have space allocated on the stack by the caller, and RCX then contains a pointer to the return space when the callee is called. Register usage for integer parameters is then pushed one to the right. RAX returns this address to the caller.
- The stack is 16-byte aligned. The "call" instruction pushes an 8-byte return value, so the all non-leaf functions must adjust the stack by a value of the form 16n+8 when allocating stack space.
- Registers RAX, RCX, RDX, R8, R9, R10, and R11 are considered volatile and must be considered destroyed on function calls. RBX, RBP, RDI, RSI, R12, R14, R14, and R15 must be saved in any function using them.
- Note there is no calling convention for the floating point (and thus MMX) registers.
- Further details (varargs, exception handling, stack unwinding) are at Microsoft's site.
Links to MS's calling-convention docs in the x86 tag wiki (along with System V ABI docs, and tons of other good stuff).
See also Why does Windows64 use a different calling convention from all other OSes on x86-64?
Upvotes: 5