Reputation: 11
I have been trying to figure out this problem for over a month now. I have no where else to turn. I have a server that listens to many multicast channels (100ish). Each socket is its own thread. Then I have a client listener (single threaded) that handles all incoming connections, disconnects, and client messaging within the same server. The idea is that a client comes in, connects, requests data from a multicast channels and I send the data back to the client. The client stays connected and I relay the UDP data back to the client. The client can either request UDP or TCP has the protocol for the data relay. At one point this was working beautifully for a couple of weeks. We did some code and kernel changes, and now we cant figure out whats gone wrong.
The server will run for hours and have hundreds of clients connected throughout the day. But at some point, randomly, the server will just stop. And by stop, I mean: all UDP sockets stop receiving/handling data (tcpdump shows data still coming to the box), the client_listener thread stops receiving client packets. BUT!!! the main client_listener socket can still receive new connections and new disconnects on the main socket. On a new connection, the main sockets is able to send a "Connection Established" packet back to the client, but then when the client responds, the select never returns.
I can post code if someone would like. If anyone has any suggestions where to look or if this sounds like something. Please let me know.
If you have any questions, please ask.
Thank you.
I would like to share my TCP Server code: This is a single thread. Works fine for hours and then I will only receive "New Connections" and "Disconnects". NO CLIENT PACKETS WILL COME IN.
int opt = 1;
int addrlen;
int sd;
int max_sd;
int valread;
int activity;
int new_socket;
char buffer[MAX_BUFFER_SIZE];
int client_socket[m_max_clients];
struct sockaddr_in address;
fd_set readfds;
for(int i = 0; i<m_max_clients; i++)
{
client_socket[i]=0;
}
if((m_master_socket = socket(AF_INET,SOCK_STREAM,0))==0)
LOG(FATAL)<<"Unable to create master socket";
if(setsockopt(m_master_socket,SOL_SOCKET,SO_REUSEADDR,(char*)&opt,sizeof(opt))<0)
LOG(FATAL)<<"Unable to set master socket";
address.sin_family = AF_INET;
address.sin_addr.s_addr = INADDR_ANY;
address.sin_port = htons(m_listenPort);
if(bind(m_master_socket,(struct sockaddr*)& address, sizeof(address))!=0)
LOG(FATAL)<<"Unable to bind master socket";
if(listen(m_master_socket,SOMAXCONN)!=0)
LOG(FATAL)<<"listen() failed with err";
addrlen = sizeof(address);
LOG(INFO)<<"Waiting for connections......";
while(true)
{
FD_ZERO(&readfds);
FD_SET(m_master_socket, &readfds);
max_sd = m_master_socket;
for(int i = 0; i<m_max_clients; i++)
{
sd = client_socket[i];
if(sd > 0)
FD_SET(sd, &readfds);
if(sd>max_sd)
max_sd = sd;
}
activity = select(max_sd+1,&readfds,NULL,NULL,NULL);
if((activity<0)&&(errno!=EINTR))
{
// int err = errno;
// LOG(ERROR)<<"SELECT ERROR:"<<activity<<" "<<err;
continue;
}
if(FD_ISSET(m_master_socket, &readfds))
{
if((new_socket = accept(m_master_socket,(struct sockaddr*)&address, (socklen_t*)&addrlen))<0)
LOG(FATAL)<<"ERROR:ACCEPT FAILED!";
LOG(INFO)<<"New Connection, socket fd is (" << new_socket << ") client_addr:" << inet_ntoa(address.sin_addr) << " Port:" << ntohs(address.sin_port);
for(int i =0;i<m_max_clients;i++)
{
if(client_socket[i]==0)
{
//try to set the socket to non blocking, tcp nagle and keep alive
if ( !SetSocketBlockingEnabled(new_socket, false) )
LOG(INFO)<<"UNABLE TO SET NON-BLOCK: ("<<new_socket<<")" ;
if ( !SetSocketNoDelay(new_socket,false) )
LOG(INFO)<<"UNABLE TO SET DELAY: ("<<new_socket<<")" ;
// if ( !SetSocketKeepAlive(new_socket,true) )
// LOG(INFO)<<"UNABLE TO SET KeepAlive: ("<<new_socket<<")" ;
ClientConnection* con = new ClientConnection(m_mocSrv, m_udpPortGenerator, inet_ntoa(address.sin_addr), ntohs(address.sin_port), new_socket);
if(con->login())
{
client_socket[i] = new_socket;
m_clientConnectionSocketMap[new_socket] = con;
LOG(INFO)<<"Client Connection Logon Complete";
}
else
delete con;
break;
}
}//for
}
else
{
try{
for(int i = 0; i<m_max_clients; i++)
{
sd = client_socket[i];
if(FD_ISSET(sd,&readfds))
{
if ( (valread = recv(sd, buffer, sizeof(buffer),MSG_DONTWAIT|MSG_NOSIGNAL)) <= 0 )
{
//remove from the fd listening set
LOG(INFO)<<"RESET CLIENT_SOCKET:("<<sd<<")";
client_socket[i]=0;
handleDisconnect(sd,true);
}
else
{
std::map<int, ClientConnection*>::iterator client_connection_socket_iter = m_clientConnectionSocketMap.find(sd);
if(client_connection_socket_iter != m_clientConnectionSocketMap.end())
{
client_connection_socket_iter->second->handle_message(buffer, valread);
if(client_connection_socket_iter->second->m_logoff)
{
LOG(INFO)<<"SOCKET LOGGED OFF:"<<sd;
client_socket[i]=0;
handleDisconnect(sd,true);
}
}
else
{
LOG(ERROR)<<"UNABLE TO FIND SOCKET DESCRIPTOR:"<<sd;
}
}
}
}
}catch(...)
{
LOG(ERROR)<<"EXCEPTION CATCH!!!";
}
}
}
Upvotes: 1
Views: 465
Reputation: 2489
From the information given I would state the following:
In short I would put my money on a thread deadlock and / or starvation. I've personally conducted experiments in which I've created a multithreaded server vs a single threaded server using Epoll. The results where night and day, Epoll blows away multithreaded implementation (for I/O) and makes the code simpler to write, debug and maintain.
Upvotes: 2