Reputation: 59
So I have a web server which forks into sub-processes which will handle multiple clients through a multi thread system.
There is one problem though. Some times, accept() returns Bad File Descriptor! Using ab too test the web server, sometimes I get that error. So let's say the first 500 requests are processed fine but then, Bad File Descriptor - the socket fd seems like a real big integer suddenly (no longer a valid one).
I only close it once - at the end of the program so it doesn't make sense that it's suddenly getting closed - gdb breakpoints guarantee me that this is exiting properly.
I also have another issue though, which seems to be that a thread created by a child process exits "randomly". It's not random, I know but I can't find why it exists as nothing gets output to the screen. I'm saying it's "random" because helgrind tells me the exiting thread still has two locks holding (I have a mutex within a mutex (and need it) - I have also tried using one mutex only and the problem remains). Is there any possibility of something causing a thread to close?
I can provide the code but it's a bit extensive and it has comments in portuguese and variable/function names are in portuguese as well.
UPDATE
Here's a snippet of my code:
void *gere_processo(void*pst)
{
process_struct pstruct = *(process_struct*)pst;
int fork_id;
int fd[2];
if(pipe(fd)==-1)
{
perror("Erro a criar pipe: ");
exit(-1);
}
// Começa a 0 antes de fazer fork!
nthreads = 0;
mainpid = getpid();
printf("\nEntering thread %d\n\n", (unsigned int)pthread_self());
//assim nao temos que usar mux mais abaixo quando fazemos uso destes valores na entrada da funcao
pthread_mutex_lock(&processofilhomux);
processofilho++;
int localprocessofilho = processofilho;
pthread_mutex_unlock(&processofilhomux);
pthread_mutex_lock(&processopaimux);
processopai++;
int localprocessopai = processopai;
pthread_mutex_unlock(&processopaimux);
fork_id=fork();
if (fork_id==-1)
{
perror("Erro a criar filho:");
exit(-1);
}
else if(fork_id==0)
{
if(signal(SIGUSR1, func_forcereadconfig) == SIG_ERR)
printf("Erro no sinal SIGUSR1");
if(signal(SIGINT, func_forcedsigint) == SIG_ERR)
printf("Erro no sinal SIGINT");
/***********************************************************************
* Codigo do Filho
* Espera clientes e cria uma thread para cada um.
* Sempre que um cliente é criado, envia uma mensagem para o pai com o numero de clientes activos.
* Sempre que um cliente é terminado, envia uma mensagem para o pai com o numero de clientes activos.
*************************************************************************/
int i, active_n=0;
char address[BUFFSIZE];
int msgsock;
char nbuffer[50];
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr,PTHREAD_CREATE_JOINABLE);
thread_s threadtable[MAXCLIENTSPERFORK];
int a;
for(a=0;a<MAXCLIENTSPERFORK;a++)
{
threadtable[a].thread = 0;
threadtable[a].active = false;
threadtable[a].t_buffer = NULL;
}
// Tempo inicial
// Se passado X segundos nao existirem pedidos, o fork sai (desde que nao seja a primeiro child)
struct timeval begin, now;
gettimeofday(&begin, NULL);
int timeout = 3;
double timediff;
while(cicle)
{
gettimeofday(&now , NULL);
// Tempo que passou desde que iniciamos o timer
timediff = (now.tv_sec - begin.tv_sec) + 1e-6 * (now.tv_usec - begin.tv_usec);
//printf("\nServer (2) Socket: %d\n", pstruct.recsock);
pthread_mutex_lock(&nthreadsmux);
if(localprocessofilho == 2)
{
printf("\n[%d] - Tempo que passou: %lf - %d\n", localprocessofilho, timediff, nthreads);
}
// Condicoes para execuçao:
// - nthreads = 0 e timeout < timediff (queremos tentar ate existir um timeout)
// - nthreads < MAXCLIENTSPERFORK mas maior que zero
// - nthreads < MAXCLIENTSPERFORK e processo filho é o primeiro (corre sempre)
if((nthreads == 0 && timeout > timediff) || nthreads < MAXCLIENTSPERFORK && nthreads > 0 || (nthreads < MAXCLIENTSPERFORK && localprocessofilho == 1))
{
pthread_mutex_unlock(&nthreadsmux);
// Espera um cliente
printf("\tProcesso %d espera de clientes\n", localprocessofilho);
if((msgsock = espera_pedido(pstruct.recsock, address)) < 0)
continue;
pthread_mutex_lock(&athreadmux);
// Figure out our thread number
int a;
for(a=0;a<MAXCLIENTSPERFORK;a++)
{
if(threadtable[a].active == false)
{
if(threadtable[a].thread != 0)
pthread_join(threadtable[a].thread, NULL); // Temos que garantir que acaba...antes de a usar
break;
}
}
threadtable[a].t_buffer = malloc(sizeof(thread_buffer));
if(threadtable[a].t_buffer == NULL)
{
perror("\nErro (0) ao alocar estrutura config:");
exit(-1);
}
threadtable[a].t_buffer->msg = malloc(sizeof(char)*BUFFSIZE);
if(threadtable[a].t_buffer->msg == NULL)
{
perror("\nErro (0) ao alocar estrutura config:");
exit(-1);
}
sprintf(threadtable[a].t_buffer->msg, "%s", address);
threadtable[a].t_buffer->sock = msgsock;
threadtable[a].t_buffer->conf = conf;
threadtable[a].active = true;
// localtime é Non-threadsafe portanto temos que correr fora das threads criadas para cada pedido
time_t timer = time(NULL);
threadtable[a].t_buffer->t = *localtime(&timer);
threadtable[a].t_buffer->timer = timer;
// Cria nova threads
if (pthread_create(&threadtable[a].thread, &attr, thread_func_pedido, (void*)&threadtable[a]) != 0)
{
printf("\n\nERROR: %d", errno);
perror("Erro a criar thread: ");
close(msgsock);
free(threadtable[a].t_buffer);
printf("\n\n\n");
sleep(10);
continue;
}
pthread_mutex_unlock(&athreadmux);
pthread_mutex_lock(&nthreadsmux);
memset(nbuffer, '\0', sizeof(nbuffer));
sprintf(nbuffer, "%d", nthreads);
if(nthreads >= MAXCLIENTSPERFORK)
{
printf("\nReached MAXCLIENTSPERFORK in Child %d\n", localprocessofilho);
}
pthread_mutex_unlock(&nthreadsmux);
//printf("\nProcesso %d enviar para o pai: %s\n", localprocessofilho, nbuffer);
write(fd[WRITE], nbuffer, (strlen(nbuffer)+1));
// Reset ao timer
gettimeofday(&begin, NULL);
}
else if(nthreads == 0 && localprocessofilho > 1)
{
pthread_mutex_unlock(&nthreadsmux);
printf("\n00Reached 0 in Child %d\n", localprocessofilho);
break;
}
else
{
pthread_mutex_unlock(&nthreadsmux);
}
}
for(a=0;a<MAXCLIENTSPERFORK;a++)
{
if(threadtable[a].active != false)
{
pthread_join(threadtable[a].thread, NULL);
}
}
printf("\nSAI FILHO\n\n");
sleep(10);
_exit(0);
}
else
{
/***********************************************************************
* Codigo do Pai
* O pai apenas lê o que o filho envia pelo pipe.
* O filho envia constantemente o numero de threads activas.
* Quando n atinge zero, o processo mata-se a si próprio o pai espera pela sua morte.
* Quando n atingoe MAXCLIENTSPERTHREAD o processo mete wantnewfork a TRUE.
*************************************************************************/
close(fd[WRITE]);
char buf[10];
pid_t result;
int status;
// Verifica o estado do child (se == 0 entao esta a correr)
// Se este for o processo pai, nao é suposto sairmos, NUNCA (a menos que haja um sinal)
while((result = waitpid(fork_id, &status, WNOHANG)) == 0 || localprocessopai == 1)
{
int retn = read(fd[READ], buf, 10);
if(retn > 0)
{
int n = atoi(buf);
//printf("\n--- PAI: %d |\n", n);
if(n == 0 && localprocessopai > 1)
{
int status = 0;
wait(&status);
printf("\n%d A sair...\n", localprocessopai);
break;
}
else if(n == MAXCLIENTSPERFORK)
{
printf("\n\n\nQUEREMOS UM NOVO PROCESSO - %d!!!\n\n\n", n);
pthread_mutex_lock(&processomux);
wantnewfork = true;
pthread_mutex_unlock(&processomux);
}
}
}
free(pst);
// Está na hora de sair daqui
int ret;
printf("\n\n\n%d SAIR DA THREAD - %d - %d - %d\n\n", (unsigned int)pthread_self(), localprocessopai, result, WIFEXITED(status));
pthread_exit(&ret);
}
}
void *thread_func_pedido(void * threadstruct)
{
thread_s *tstmp = (thread_s*)threadstruct;
thread_buffer t = *(*tstmp).t_buffer;
char buffer[BUFSIZE];
char ver;
pthread_mutex_lock(&nthreadsmux);
nthreads++;
pthread_mutex_unlock(&nthreadsmux);
/*Leitura do Pedido*/
int totalread;
if((totalread = recv(t.sock, buffer, BUFSIZE,0)) <= 0)
{
printf("\nTotal read: %d\n", totalread);
perror("Erro lendo a request: ");
wait(10);
close(t.sock);
pthread_mutex_lock(&nthreadsmux);
if(nthreads > 0)
nthreads--;
pthread_mutex_unlock(&nthreadsmux);
int ret;
pthread_exit(&ret);
}
req_data pedido;
pedido = init_data();
/* Processamento do Pedido */
if(buffer == NULL)
{
pedido.errorcode=400;
}
else
{
pedido.request = convert_get(buffer,&pedido.errorcode,&ver);
if(pedido.errorcode==200)
{
if(special_request(t.sock, pedido.request,ver) == false)
{
pedido.errorcode = imprime_ficheiro(t.sock, pedido.request, ver, t.conf->httpdocs, t.conf->cgibin);
}
}
}
if (pedido.errorcode != 200)
{
imprime_erro(t.sock, pedido.errorcode, ver, pedido.request);
}
/*--- Armazenamento do Pedido ---*/
pedido = gen_data(pedido.errorcode, (char*)t.msg, pedido.request, tstmp->t_buffer->t, tstmp->t_buffer->timer);
pthread_mutex_lock( &mux );
FILE* statfile = (FILE*) stat_init();
stat_armazena_req(pedido,statfile);
fclose(statfile);
pthread_mutex_unlock( &mux );
pedido=free_data(pedido);
//printf("\nClosing socket %d\n", t.sock);
close(t.sock);
pthread_mutex_lock(&nthreadsmux);
pthread_mutex_lock(&processofilhomux);
if(nthreads > 0)
nthreads--;
//printf("\n [%d] nthreads reduced to %d\n",processofilho, nthreads);
pthread_mutex_unlock(&processofilhomux);
pthread_mutex_unlock(&nthreadsmux);
// Free the thread buffer, we're going to realloc it later
free(tstmp->t_buffer->msg);
free(tstmp->t_buffer);
tstmp->t_buffer = NULL;
pthread_mutex_lock(&athreadmux);
tstmp->active = false;
pthread_mutex_unlock(&athreadmux);
int ret;
pthread_exit(&ret);
}
int main(int argc, char * argv[]){
// Mascara de Sinais
if(signal(SIGTERM, func_sigint) == SIG_ERR)
printf("Erro no sinal SIGCONT");
if(signal(SIGINT, func_sigint) == SIG_ERR)
printf("Erro no sinal SIGINT");
if(signal(SIGUSR1, func_readconfig) == SIG_ERR)
printf("Erro no sinal SIGUSR1");
// Socket
sock = cria_socket(PORT);
conf = read_config("./www.config");
if(conf == NULL)
{
conf = (config*)malloc(sizeof(config));
if(conf == NULL)
{
perror("\nErro (2) ao alocar estrutura config:");
exit(-1);
}
// Set defaults
sprintf(conf->httpdocs, DOCUMENT_ROOT);
sprintf(conf->cgibin, CGI_ROOT);
}
/*
Inicialmente é criado um processo que tratará de X threads/clientes
Após cada cliente ser tratado, a respectiva thread é destruída
Depois, caso a lista de threads esteja vazia, o processo mata-se a si próprio, informado o pai do sucedido.
Só é criado um novo processo caso não existam processos livres (i.e. se houver muitos clientes concorrentes)
O processo pai cria uma thread para cada processo filho criado,
de forma a que estas consigam interagir com o mesmo através de um pipe
*/
while (cicle)
{
//Aparentemente o ultimo processo ficou cheio, portanto temos que criar um novo
pthread_mutex_lock(&processomux);
if(wantnewfork)
{
pthread_mutex_unlock(&processomux);
pthread_mutex_lock(&processopaimux);
printf("\n-----------PROCESSO NUM: %d\n", processopai+1);
pthread_mutex_unlock(&processopaimux);
pthread_t forkthread;
pthread_attr_t attr;
pthread_attr_init(&attr);
pthread_attr_setdetachstate(&attr,PTHREAD_CREATE_DETACHED);
process_struct *pstruct = malloc(sizeof(process_struct));
if(pstruct == NULL)
{
perror("\nErro (0) ao alocar estrutura config:");
exit(-1);
}
pstruct->recsock = sock;
pstruct->conf = conf;
//printf("\nServer (1) Socket: %d\n", pstruct->recsock);
if (pthread_create(&forkthread, &attr, gere_processo, (void *)pstruct) != 0)
{
perror("Erro a criar thread gere_processo: ");
exit(-1);
}
pthread_mutex_lock(&processomux);
wantnewfork = false;
pthread_mutex_unlock(&processomux);
}
else
pthread_mutex_unlock(&processomux);
usleep(10000);
}
close(sock);
free(conf);
printf("Sai Pai\n");
return 0;
}
Update
I have a global variable named sock which holds the fd. The pstruct also has a member called revsock which has the same value as sock.
Apparently recsock is what gets changed (valgrind doesn't complain) but sock remains unchanged.
free(pst) might be the cause of it - though I'm unsure why it executes (it shouldn't for the main process - but that would explain the random exits and that would cause the bad file descriptor issue for random exits on threads which are not the 2nd thread).
I don't understand why free(pst) would cause this though...the child should have its own pst structure as well. But if I comment free(pst) I no longer have bad file descriptor issues. Still, free(pst) should be there, it's the "random thread exiting" that's causing it to run when it shouldn't.
Upvotes: 0
Views: 123
Reputation: 54325
If the integer that represents the socket's file descriptor changes suddenly, and your program didn't intentionally change it, then you have one of two bugs:
You're using threads and you aren't locking your thread's access to shared memory. This might lead to things like using a pointer before it is set to a value. If that pointer is to memory with the fd in it, that could overwrite it.
You have a buffer overflow. Somewhere, your program is writing bad data into the memory that holds the file descriptor.
To solve the problem I recommend using a debugger and a hardware watchpoint to tell you when the value of the file descriptor changes.
Upvotes: 1