jsab
jsab

Reputation: 11

Linux TCP Server Issue C++

I have been trying to figure out this problem for over a month now. I have no where else to turn. I have a server that listens to many multicast channels (100ish). Each socket is its own thread. Then I have a client listener (single threaded) that handles all incoming connections, disconnects, and client messaging within the same server. The idea is that a client comes in, connects, requests data from a multicast channels and I send the data back to the client. The client stays connected and I relay the UDP data back to the client. The client can either request UDP or TCP has the protocol for the data relay. At one point this was working beautifully for a couple of weeks. We did some code and kernel changes, and now we cant figure out whats gone wrong.

The server will run for hours and have hundreds of clients connected throughout the day. But at some point, randomly, the server will just stop. And by stop, I mean: all UDP sockets stop receiving/handling data (tcpdump shows data still coming to the box), the client_listener thread stops receiving client packets. BUT!!! the main client_listener socket can still receive new connections and new disconnects on the main socket. On a new connection, the main sockets is able to send a "Connection Established" packet back to the client, but then when the client responds, the select never returns.

I can post code if someone would like. If anyone has any suggestions where to look or if this sounds like something. Please let me know.

If you have any questions, please ask.

Thank you.


I would like to share my TCP Server code: This is a single thread. Works fine for hours and then I will only receive "New Connections" and "Disconnects". NO CLIENT PACKETS WILL COME IN.

int opt = 1;
  int addrlen;
  int sd;
  int max_sd;
  int valread;
  int activity;
  int new_socket;
  char buffer[MAX_BUFFER_SIZE];
  int client_socket[m_max_clients];
  struct sockaddr_in address;

  fd_set readfds;
  for(int i = 0; i<m_max_clients; i++)
  {
    client_socket[i]=0;
  }

  if((m_master_socket = socket(AF_INET,SOCK_STREAM,0))==0)
    LOG(FATAL)<<"Unable to create master socket";

  if(setsockopt(m_master_socket,SOL_SOCKET,SO_REUSEADDR,(char*)&opt,sizeof(opt))<0)
    LOG(FATAL)<<"Unable to set master socket";

  address.sin_family = AF_INET;
  address.sin_addr.s_addr = INADDR_ANY;
  address.sin_port = htons(m_listenPort);

  if(bind(m_master_socket,(struct sockaddr*)& address, sizeof(address))!=0)
    LOG(FATAL)<<"Unable to bind master socket";

  if(listen(m_master_socket,SOMAXCONN)!=0)
    LOG(FATAL)<<"listen() failed with err";

  addrlen = sizeof(address);
  LOG(INFO)<<"Waiting for connections......";

while(true)
  {
    FD_ZERO(&readfds);

    FD_SET(m_master_socket, &readfds);
    max_sd = m_master_socket;

    for(int i = 0; i<m_max_clients; i++)
    {
      sd = client_socket[i];

      if(sd > 0)
        FD_SET(sd, &readfds);

      if(sd>max_sd)
        max_sd = sd;
    }

    activity = select(max_sd+1,&readfds,NULL,NULL,NULL);

    if((activity<0)&&(errno!=EINTR))
    {
    //  int err = errno;
  //    LOG(ERROR)<<"SELECT ERROR:"<<activity<<" "<<err;
      continue;
    }

    if(FD_ISSET(m_master_socket, &readfds))
    {
      if((new_socket = accept(m_master_socket,(struct sockaddr*)&address, (socklen_t*)&addrlen))<0)
        LOG(FATAL)<<"ERROR:ACCEPT FAILED!";

      LOG(INFO)<<"New Connection, socket fd is (" << new_socket << ") client_addr:" << inet_ntoa(address.sin_addr) << " Port:" << ntohs(address.sin_port);
      for(int i =0;i<m_max_clients;i++)
      {
        if(client_socket[i]==0)
        {
          //try to set the socket to non blocking, tcp nagle and keep alive
          if ( !SetSocketBlockingEnabled(new_socket, false) )
            LOG(INFO)<<"UNABLE TO SET NON-BLOCK: ("<<new_socket<<")" ;
          if ( !SetSocketNoDelay(new_socket,false) )
            LOG(INFO)<<"UNABLE TO SET DELAY: ("<<new_socket<<")" ;
//           if ( !SetSocketKeepAlive(new_socket,true) )
//            LOG(INFO)<<"UNABLE TO SET KeepAlive: ("<<new_socket<<")" ;

          ClientConnection* con = new ClientConnection(m_mocSrv, m_udpPortGenerator, inet_ntoa(address.sin_addr), ntohs(address.sin_port), new_socket);
          if(con->login())
          {
            client_socket[i] = new_socket;
            m_clientConnectionSocketMap[new_socket] = con;
            LOG(INFO)<<"Client Connection Logon Complete";
          }
          else
            delete con;
          break;
        }
      }//for
    }
    else
    {
      try{
        for(int i = 0; i<m_max_clients; i++)
        {
          sd = client_socket[i];
          if(FD_ISSET(sd,&readfds))
          {
            if ( (valread = recv(sd, buffer, sizeof(buffer),MSG_DONTWAIT|MSG_NOSIGNAL)) <= 0 )
            {
             //remove from the fd listening set
              LOG(INFO)<<"RESET CLIENT_SOCKET:("<<sd<<")";
              client_socket[i]=0;
              handleDisconnect(sd,true);
           }
           else
           {
             std::map<int, ClientConnection*>::iterator client_connection_socket_iter = m_clientConnectionSocketMap.find(sd);
             if(client_connection_socket_iter != m_clientConnectionSocketMap.end())
             {
               client_connection_socket_iter->second->handle_message(buffer, valread);
               if(client_connection_socket_iter->second->m_logoff)
               {
                  LOG(INFO)<<"SOCKET LOGGED OFF:"<<sd;
                  client_socket[i]=0;
                  handleDisconnect(sd,true);
               }
             }
             else
             {
                LOG(ERROR)<<"UNABLE TO FIND SOCKET DESCRIPTOR:"<<sd;
             }
           }
          }
        }
      }catch(...)
      {
        LOG(ERROR)<<"EXCEPTION CATCH!!!";
      }
    }
  }

Upvotes: 1

Views: 465

Answers (1)

Jonathan Eustace
Jonathan Eustace

Reputation: 2489

From the information given I would state the following:

  • Do not use a thread for each connection. Since you're on Linux use EPOLL Edge Triggered Multiplexing. Most newer web frameworks use this technology. For more info check 10K Problem. By eliminating threads from the equation you're eliminating the possibilities of a deadlock and reducing the complexity of debugging / worrying about thread safe variables.
  • Ensure each connection when finished is completely closed.
  • Ensure that you do not have some new firewall rules that popped up in iptables since the upgrade.
  • Check any firewalls on the network to see if they are restricting certain types of activity (is your server on a new IP since the upgrade?)

In short I would put my money on a thread deadlock and / or starvation. I've personally conducted experiments in which I've created a multithreaded server vs a single threaded server using Epoll. The results where night and day, Epoll blows away multithreaded implementation (for I/O) and makes the code simpler to write, debug and maintain.

Upvotes: 2

Related Questions