Reputation: 101
When calling fclose on the global file descriptor, the program hang.
It happened after exits of several threads created by clone.
Below is the sequence:
FILE * fid = fopen("filename", "w");
...
for(int i=0; i<4; i++){
clone((int (*)(void*))do_work, stack[i], CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|SIGCHLD|CLONE_CHILD_CLEARTID|CLONE_PARENT_SETTID|CLONE_IO, NULL, &(ctid[i]), NULL, &(ctid[i]) );
}
...
fclose(fid);
Non thread deals with fid.
From strace, the program hang in futex waiting for "main_arena". I think this should be some mutex inside glibc.
Backtrace:
#0 0x0000003f09edf9ee in __lll_lock_wait_private () from /lib64/libc.so.6
#1 0x0000003f09e76d31 in _L_lock_5478 () from /lib64/libc.so.6
#2 0x0000003f09e71c8d in _int_free () from /lib64/libc.so.6
#3 0x0000003f09e7273b in free () from /lib64/libc.so.6
#4 0x0000003f09e60d5b in fclose@@GLIBC_2.2.5 () from /lib64/libc.so.6
This happens on Linux with glibc 2.5, but not on Linux with glibc 2.12.
I am wondering whether it is because we cannot create threads using clone() like this. In NPTL, lots of more things are done, such as set_robust_futex() and seting thread local storage.
Thanks!
Upvotes: 0
Views: 1785
Reputation: 1881
What is your kernel version?
It seems a kernel bug.
see futex_wait bug and kernel patch for more information.
Upvotes: 0
Reputation: 182769
I can't imagine how you would expect this to work. The stdio library uses locks internally. Locks are specific to the threading model being used. You are using your own threading model, but expecting the stdio library's locks to magically work with it. That's clearly not a reasonable expectation.
Upvotes: 0