Reputation: 141
When conducting a stress test on some server code I wrote, I noticed that even though I am calling close() on the descriptor handle (and verifying the result for errors) that the descriptor is not released which eventually causes accept() to return an error "Too many open files".
Now I understand that this is because of the ulimit, what I don't understand is why I am hitting it if I call close() after each synchronous accept/read/send cycle?
I am validating that the descriptors are in fact there by running a watch with lsof:
ctsvr 9733 mike 1017u sock 0,7 0t0 3323579 can't identify protocol ctsvr 9733 mike 1018u sock 0,7 0t0 3323581 can't identify protocol ...
And sure enough there are about 1000 or so of them. Further more, checking with netstat I can see that there are no hanging TCP states (no WAIT or STOPPED or anything).
If I simply do a single connect/send/recv from the client, I do notice that the socket does stay listed in lsof; so this is not even a load issue.
The server is running on an Ubuntu Linux 64-bit machine.
Any thoughts?
Upvotes: 9
Views: 7414
Reputation: 23
Have you ever called perror() after close()? I think the returned string will give you some help;
Upvotes: 1
Reputation: 141
So using strace (thanks Gearoid), which I have no idea how I ever lived without, I noted I was in fact closing the descriptors.
However. And for the sake of posterity I lay bare my foolish mistake:
Socket::Socket() : impl(new Impl) {
impl->fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
....
}
Socket::ptr_t Socket::accept() {
auto r = ::accept(impl->fd, NULL, NULL);
...
ptr_t s(new Socket);
s->impl->fd = r;
return s;
}
As you can see, my constructor allocated a socket immediately, and then I replaced the descriptor with the one returned by accept - creating a leak. I had refactored the accept code from a standalone Acceptor class into the Socket class without changing this.
Using strace I could easily see socket() being run each time which lead to my light bulb moment.
Thanks all for the help!
Upvotes: 5
Reputation: 1683
You are most probably hanging on a recv()
or send()
command. Consider setting a timeout using setsockopt
.
I noticed a similar output on lsof when the socket was closed on the other end but my thread was keeping the socket open hanging on the recv()
command waiting for data.
Upvotes: 0