What are the disadvantages of hosting a small server application using only UDP?

Question

I'm working on a routing simulator where nodes connect to a master routing manager to get their initial information, and then subsequently start to converge their internal routing tables with other virtual nodes.

ninja edit: I should note that all of my testing currently is local, with multiple terminals up. However, it's expected that this could work with multiple non-local nodes.

For my manager, I'm just using this:

int
RoutingManager::Initialize(int myPort)
{
    int length, n;
    struct sockaddr_in server;

    mySocket = socket(AF_INET, SOCK_DGRAM, 0);
    if (mySocket < 0) 
        perror("Opening socket");

    length = sizeof(server);
    bzero(&server,length);

    server.sin_family = AF_INET;
    server.sin_addr.s_addr = INADDR_ANY;
    server.sin_port = htons(myPort);

    if (bind(mySocket,(struct sockaddr *)&server,length) < 0) 
        perror("binding");
}

Where I store mySocket and use it for all communication. Whenever I receive a new message from recvfrom(), I just parse that address structure, and save it to a container:

cout << "Waiting for nodes...
";
n = recvfrom(mySocket,buffer,1024,0,(struct sockaddr *)&newNode, &length);
[...]
map >::iterator iter;
iter = topology.begin();
if(!iter->second.online)
    {
        activeNodeCount++;

        iter->second.online = true;

    //connection here is the NodeConnection structure below
        iter->second.connection.theirAddress = newNode;
        iter->second.connection.ipstr = inet_ntop(AF_INET, &newNode.sin_addr, ip4, INET_ADDRSTRLEN);
        iter->second.connection.port = newNode.sin_port;

        activeNodes.push_back(newNode);

[...]

    struct Node
{
    Node(){online = false;}
    int id;
    bool online;

    //this nodes known neighbors
    std::map neighbors;

    //this nodes connection information
    struct NodeConnection connection;
};

struct NodeConnection
{
    struct sockaddr_in theirAddress; 
    const char* ipstr;
    unsigned short int port;
};

Whenever I need to send data to a certain node, I just look up it's information in the container and do a sendto(). When I receive data, I just check which port it came in on and look it up in my internal node-map. I mainly set it up this way because binding sockets got really confusing really fast and it seemed like the setup for TCP was a bit more involved. I feel like this is a wrong approach though, even for something so small as a networking project for class - but why? What's the better alternative here?

Maybe the issue is that I don't fully understand how to reliably create and persist multiple sockets within my server. Would I be better off binding multiple TCP connections to each node, and running UDP between the nodes themselves? If I did this, I assume I would have to create a new socket for each node, and bind it accordingly - thus keeping a record of the socket and the sockaddr structure information for sending data to that node?

Damon · Accepted Answer

Generally, what you are doing is a correct approach (not the one correct, but one correct approach). You could do it with TCP as well, which would make message reliability easier (if needed) at the price of making socket management slightly more complicated. The client code would likely be easier with TCP as well (not so the server!).

Conceptually, TCP is the "easier to grok" of the two protocols for one connection since it simulates a reliable in-order stream on top of IP, which you can consider pretty much like a file on your harddisk (except you cannot seek backwards) that you read from and write to.

On the other hand, several TCP connections mean you need one socket for every connection, and you must somehow deal with the fact that you can only read from one at a time. If no data is available, your thread blocks¹, which means you also can't read data that would maybe be available on a different socket -- something must be done about that.

The two solutions² are to either run one thread (or process) per connection -- this is fine for a small number but does not scale well --, or to multiplex using a function like select or poll. When these multiplexing functions tell you that data (or a new connection) is available on a particular socket (and only then!) you read it.
Also, the way connections are "created" on the server side is not very intuitive if you don't know how it works. It isn't very complicated, but it sure wasn't what I expected when I first learned about how sockets work. You first create a socket that you bind to and listen on, and then you accept connections. The accept function leaves the listening socket as it is, and returns a different socket that refers only to one connection with another host.
Lastly, your server must be prepared for partial requests. While TCP guarantees that data will (eventually) arrive, and in-order, it does not guarantee that it will arrive in one chunk. You might receive your requests 1-2 bytes at a time (in practice you won't normally see single bytes, but you must be prepared for it, as it can happen). Your application needs to keep receiving bits and pieces of data and collect them (in a string or similar) until it has enough together for a complete request.

UDP on the other hand, has the advantage of simplicity. You have one socket, none more and none less. Everything from any number of clients arrives at that one socket. You need not multiplex, but can in the easiest case just read from the socket, and block until something comes in. No connection establishment. It doesn't matter whether you have one client or a thousand of them. Also, you always get a complete datagram at a time, no partial requests. All or nothing.
Since recvfrom necessarily tells you where the datagram came from (you would otherwise have no way of telling), you already have the sockaddr that you need for sendto (no need to look that up anywhere!). You only need to do a lookup if there is some other information that you need in order to send an answer.

UDP however has two disadvantages over TCP, which are even related in some way and which can become very significant. First of all, UDP is unreliable, it does not guarantee delivery of your packets. While this sounds scary, like you always lose maybe 5-10% or so of your traffic, that is normally not the case. Packets don't just get lost or scrambled on the wire for no reason (not normally, anyway). Network traffic is surprisingly resilient, much more than you would think (even more so as some wire protocols, e.g. ATM, will use forward error correction).

However, UDP also does not do any congestion control, and that is where it gets troublesome. Whenever you send out data on a socket, it's sent, unconditionally. Your ethernet (or similar network) card will make sure that the datagram makes it to the wire reliably. But eventually, as you send large bulks of data, there will be a router in between you and your destination which cannot keep up with the number of packets for some reason (maybe because you send too fast, or because someone else also sends something completely unrelated, the reason does not matter a lot). At that point, the router will do the only thing it can do, it will throw packets away. They never arrive at the other end, and nobody will tell you.
Further, it is possible that the other end is not able to process the packets you send fast enough. Receive buffers have a limited size (usually something around 64-128 kilobytes), and once the receive buffer is full... you guessed it, the packets will simply be thrown away. Again, you suffer packet loss, and nobody is telling you! What's worst in that case, they're perfectly good packets, and they arrived just fine, but the application on the other host still isn't going to see them.

Which leads to the single most important thing to remember: Don't send any faster than the other side (and any router in between) can cope with.

TCP deals with this by having the other side acknowledge that it has received what you sent (and resending if that doesn't happen), and by limiting the amount of packets it can send out in one go before receiving an acknowledgement. After that, it stops sending until acknowledgements are received, and eventually grows its window using a more or less evolved algorithm (it does in fact a lot more, but that gets too complicated).

If you need to be able to rely that whatever you sent is received, you will have to do something similar, but not necessarily with an equally complicated algorithm (you also might or might not care about in-order delivery, or duplicate packets, or other things).
On the other hand, it may be perfectly allowable to lose a packet once in a while, and you may not need to do anything special at all, it depends on what you really need/want.

¹ Well, not necessarily. It is possible to set a socket to non-blocking, but busy waiting until something finally arrives is very inefficient.
²Yes OK, there are more than two... signal-driven I/O (Unix) or overlapped I/O (Windows) being examples. But the two methods mentioned above are the ones that are understandable and portable.

What are the disadvantages of hosting a small server application using only UDP?

Answers (2)

Related Questions