Reputation: 101
I had this code:
int main(int argc, char** argv)
{
pthread_t thread[thr_num];
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
// just for debugging //
struct rlimit rlim;
getrlimit(RLIMIT_NPROC, &rlim);
printf ("soft = %d \n", rlim.rlim_cur);
printf ("hard = %d \n", rlim.rlim_max);
////
for ( i = 1 ; i <= thr_num ; i++) {
if(pthread_create( &thread[i], &attr, loggerThread, (void*)argv ) ) {
printf("pthread_create failure, i = %d, errno = %d \n", i, errno);
exit(1);
}
}
pthread_attr_destroy(&attr);
for ( i = 1 ; i <= thr_num ; i++) {
if( pthread_join(thread[i], (void**)&status ) ) {
exit(1);
}
}
return 0;
}
void* loggerThread(void* data)
{
char** sthg = ((char**)data);
pthread_exit(NULL);
}
I don't understand why when I run this code with thr_num=291, I got an error: pthread_create failure, i = 291, errno = 11 (EAGAIN)
with thr_num=290 worked fine. I run this code on a Linux 2.6.27.54-0.2-default (SLES 11) The rlim.rlim_cur has value 6906 the rlim.rlim_max also. The same I saw with 'ulimit -a' for 'max user processes'. I checked also /proc/sys/kernel/threads-max (it was 13813) guided by pthread_create man page. Did not find any parameters with value 290 for 'sysctl -a' output either.
Ocassionally I found out from this link: pthread_create and EAGAIN that: "Even if pthread_exit or pthread_cancel is called, the parent process still need to call pthread_join to release the pthread ID, which will then become recyclable"
so just as a try I modified my code to this:
for ( i = 1 ; i <= thr_num ; i++) {
if(pthread_create( &thread[i], &attr, loggerThread, (void*)argv ) ) {
printf("pthread_create failure, i = %d, errno = %d \n", i, errno);
exit(1);
}
if( pthread_join(thread[i], (void**)&status ) ) {
printf("pthread_join failure, i = %d, errno = %d \n", i, errno);
exit(1);
}
}
pthread_attr_destroy(&attr);
and then everything worked: I didn't get the error at 291 cycle.
I would like to understand why with my original code I got the error: 1. because of a wrong programing with threads 2. or I hit some system limit what I couldn't identify
Also would like to know if my correction is good for this problem or what hidden things, pitfalls I eventually introduced with this solution ? Thanks !
Upvotes: 3
Views: 4535
Reputation: 906
I initially wrote this as a comment, but just in case...
Your code:
for ( i = 1 ; i <= thr_num ; i++) {
if(pthread_create( &thread[i], &attr, loggerThread, (void*)argv ) ) {
printf("pthread_create failure, i = %d, errno = %d \n", i, errno);
exit(1);
}
}
...
for ( i = 1 ; i <= thr_num ; i++) {
if( pthread_join(thread[i], (void**)&status ) ) {
exit(1);
}
}
In both the for() loops you check from 1 - thr_num. This means you are out of bounds in your array thread[thr_num] since arrays start at index 0. You should thus iterate from 0 to one less than thr_num:
for ( i = 0 ; i < thr_num ; i++)
I'm actually surprised you didn't get a segmentation fault before hitting 291 as thr_num.
Upvotes: 2
Reputation: 229108
I would like to understand why with my original code I got the error: 1. because of a wrong programing with threads 2. or I hit some system limit what I couldn't identify
You likely hit a system limit. Likely you ran out of address space. Default, each thread gets 8-10Mb of stack space on linux. If you create 290 threads, that's using nearly 3Gb of address space - the max for a 32 bit process.
You get EAGAIN in such a case, since there arn't enough resources to create the thread just now (since there isn't enough address space available at the time).
When a thread exits, not all resources of the thread is released (on linux, the entire stack of the thread is kept around).
If the thread is in a detached state, e.g. you called pthread_detach() or specified a detached state when it was created as an attribute to pthread_create(), all resources are release when the thread exits - but you can't pthread_join() a detached thread.
If the thread is not detached, you need to call pthread_join() on it to release the resources.
Note that the modified code of yours where you call pthread_join() inside the loop will:
i.e. only one other thread is running at a time - which seems a bit pointless.
You can certainly spawn more than one thread that run concurrently - but there's a limit. On your machine, you seem to have found the limit to be around 290.
Upvotes: 5