yulai
yulai

Reputation: 761

Socket Programming - Multiple connections: Forking or FD_SET?

I'm trying to understand the different practices when it comes to socket programming and handling multiple connections. In particular when a server needs to serve multiple clients.

I have looked at some code examples; where some use fd_set and others use a fork() system call.

Roughly:

FD_SET

//Variables
fd_set fds, readfds;

//bind(...)
//listen(...)
FD_ZERO(&fds);
FD_SET(request_socket, &fds);

while(1) {
    readfds = fds;
    if (select (FD_SETSIZE, &readfds, NULL, NULL, NULL) < 0)
        //Something went wrong

    //Service all sockets with input pending
    for(i = 0; i < FD_SETSIZE; i++) {
        if (FD_ISSET (i, &readfds)) {
            if (i == request_socket) {
               /* Connection request on original socket. */
               int new;
               size = sizeof (clientname);
               new = accept (request_socket, (struct sockaddr *) &clientname, &size);
                if (new < 0)
                    //Error

                fprintf (stderr, "Server: connect from host %s, port %hd.\n", inet_ntoa (clientname.sin_addr), ntohs (clientname.sin_port));
                FD_SET (new, &fds);
          }
          else {
              /* Data arriving on an already-connected socket. */
              if (read_from_client (i) < 0) {  //handles queries
                  close (i);
                  FD_CLR (i, &fds);
              }
          }//end else

fork()

//bind()
//listen()

while(1) {
    //Connection establishment
    new_socket = accept(request_socket, (struct sockaddr *) &clientaddr, &client_addr_length);

    if(new_socket < 0) {
        error("Error on accepting");
    }

    if((pid = fork()) < 0) {
        error("Error on fork");
    }
    if((pid = fork()) == 0) {
        close(request_socket);
        read_from_client(new_socket);
        close(new_socket);
        exit(0);
    }
    else {
        close(new_socket);
    }
}

My question is then: what is the difference between the two practices (fd_set and fork)? Is one more suitable than the other?

Upvotes: 2

Views: 4072

Answers (2)

init_js
init_js

Reputation: 4591

You would choose between one of the two approaches, select() or fork() based on the nature of the IO operations you have to do once you receive a connection from a client.

Many IO system calls are blocking. While a thread is blocked on IO performed for one client (e.g. connecting to a database or server, reading a file on disk, reading from the network, etc.), it cannot serve the other clients' requests. If you create a new process with fork(), then each process can block independently without impeding progress on the other connections. Although it may seem advantageous to start a process for each client, it has drawbacks: multiple processes are harder to coordinate, and consume more resources. There is no right or wrong approach, it is all about trade-offs.

You may read about "events vs threads" to understand the various tradeoffs to consider: See: Event Loop vs Multithread blocking IO

The select() system call approach (which you've called the FD_SET approach), would generally classify as a polling approach. Using this, a process can wait on multiple file descriptor events at once, sleep there, and be woken up when activity arises on at least one of the file descriptors specified in the FD_SET. You may read the man page on select for details (man 2 select). This will allow the server process to read from the multiple clients bit by bit (but still one at a time), as soon as new data arrives on any socket of interest.

Trying to call read() on a socket that has no data available would block -- select just makes sure you only do it on those that have data available. It is generally called in a loop so that the process comes back for the next piece of work. Writing the program in that style often forces one to handle requests iteratively, and carefully, because you want to avoid blocking in your single process.

fork() (man 2 fork) creates a child process. Child processes are created with a copy of the file descriptors open in the parent, which explains all the fd-closing business when the system call returns. Once you have a child process to take care of the client's socket, then you can write straightforward linear code with blocking calls without affecting the other connections (because those would be handled in parallel by other child processes of the server).

Upvotes: 5

Arun Kaushal
Arun Kaushal

Reputation: 631

The main difference between the two practices is the number of processes used to handle multiple connections. With select, single process (in fact a single thread) can handle concurrent connections from multiple clients. When we use the fork based approach, a new process is created for every new connections. So if there are N concurrent client connections, there will be N processes to handle those connections.

When we use select, we don't need to worry about shared memory or synchronization as everything is happening within the same thread of execution.

On the other hand, when we use select, we need to be more careful while coding as the same thread of execution is going to handle multiple clients. In fork based approach, the child process has to handle only a single client so it tends to be bit easier to implement.

When we use fork based approach, we end up using more system resource as a result of creating more processes.

The choice of approach depends on the application - expected number of connections, the nature of connections (persistent or short duration), whether there is a need to share data among connection handlers etc.

Upvotes: 1

Related Questions