wildfrontier
wildfrontier

Reputation: 101

pthread_create() fails with EAGAIN at 291 cycle

I had this code:

int main(int argc, char** argv)
{
  pthread_t thread[thr_num];
  pthread_attr_t attr;
  pthread_attr_init(&attr);
  pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);

  // just for debugging //
    struct rlimit rlim;
    getrlimit(RLIMIT_NPROC, &rlim);
    printf ("soft = %d \n", rlim.rlim_cur);
    printf ("hard = %d \n", rlim.rlim_max);
  ////

  for ( i = 1 ; i <= thr_num ; i++) {
    if(pthread_create( &thread[i], &attr, loggerThread, (void*)argv ) ) {
      printf("pthread_create failure, i = %d, errno = %d \n", i, errno);
      exit(1);
    }
  }

  pthread_attr_destroy(&attr);

  for ( i = 1 ; i <= thr_num ; i++) {
    if( pthread_join(thread[i], (void**)&status ) ) {
      exit(1);
    }
  }  

  return 0;
}

void* loggerThread(void* data) 
{
  char** sthg = ((char**)data);
  pthread_exit(NULL);
}

I don't understand why when I run this code with thr_num=291, I got an error: pthread_create failure, i = 291, errno = 11 (EAGAIN)

with thr_num=290 worked fine. I run this code on a Linux 2.6.27.54-0.2-default (SLES 11) The rlim.rlim_cur has value 6906 the rlim.rlim_max also. The same I saw with 'ulimit -a' for 'max user processes'. I checked also /proc/sys/kernel/threads-max (it was 13813) guided by pthread_create man page. Did not find any parameters with value 290 for 'sysctl -a' output either.

Ocassionally I found out from this link: pthread_create and EAGAIN that: "Even if pthread_exit or pthread_cancel is called, the parent process still need to call pthread_join to release the pthread ID, which will then become recyclable"

so just as a try I modified my code to this:

for ( i = 1 ; i <= thr_num ; i++) {
  if(pthread_create( &thread[i], &attr, loggerThread, (void*)argv ) ) {
    printf("pthread_create failure, i = %d, errno = %d \n", i, errno);
    exit(1);
  }

  if( pthread_join(thread[i], (void**)&status ) ) {
    printf("pthread_join failure, i = %d, errno = %d \n", i, errno);
    exit(1);
  }     
}
pthread_attr_destroy(&attr);

and then everything worked: I didn't get the error at 291 cycle.

I would like to understand why with my original code I got the error: 1. because of a wrong programing with threads 2. or I hit some system limit what I couldn't identify

Also would like to know if my correction is good for this problem or what hidden things, pitfalls I eventually introduced with this solution ? Thanks !

Upvotes: 3

Views: 4535

Answers (2)

PurpleAlien
PurpleAlien

Reputation: 906

I initially wrote this as a comment, but just in case...

Your code:

  for ( i = 1 ; i <= thr_num ; i++) {
    if(pthread_create( &thread[i], &attr, loggerThread, (void*)argv ) ) {
      printf("pthread_create failure, i = %d, errno = %d \n", i, errno);
      exit(1);
    }
  }
...
  for ( i = 1 ; i <= thr_num ; i++) {
    if( pthread_join(thread[i], (void**)&status ) ) {
      exit(1);
    }
  }  

In both the for() loops you check from 1 - thr_num. This means you are out of bounds in your array thread[thr_num] since arrays start at index 0. You should thus iterate from 0 to one less than thr_num:

for ( i = 0 ; i < thr_num ; i++)

I'm actually surprised you didn't get a segmentation fault before hitting 291 as thr_num.

Upvotes: 2

nos
nos

Reputation: 229108

I would like to understand why with my original code I got the error: 1. because of a wrong programing with threads 2. or I hit some system limit what I couldn't identify

You likely hit a system limit. Likely you ran out of address space. Default, each thread gets 8-10Mb of stack space on linux. If you create 290 threads, that's using nearly 3Gb of address space - the max for a 32 bit process.

You get EAGAIN in such a case, since there arn't enough resources to create the thread just now (since there isn't enough address space available at the time).

When a thread exits, not all resources of the thread is released (on linux, the entire stack of the thread is kept around).

  • If the thread is in a detached state, e.g. you called pthread_detach() or specified a detached state when it was created as an attribute to pthread_create(), all resources are release when the thread exits - but you can't pthread_join() a detached thread.

  • If the thread is not detached, you need to call pthread_join() on it to release the resources.

Note that the modified code of yours where you call pthread_join() inside the loop will:

  1. spawn a thread
  2. Wait for that thread to finish
  3. go to 1

i.e. only one other thread is running at a time - which seems a bit pointless.

You can certainly spawn more than one thread that run concurrently - but there's a limit. On your machine, you seem to have found the limit to be around 290.

Upvotes: 5

Related Questions