user1735067
user1735067

Reputation: 141

close on socket not releasing file descriptor

When conducting a stress test on some server code I wrote, I noticed that even though I am calling close() on the descriptor handle (and verifying the result for errors) that the descriptor is not released which eventually causes accept() to return an error "Too many open files".

Now I understand that this is because of the ulimit, what I don't understand is why I am hitting it if I call close() after each synchronous accept/read/send cycle?

I am validating that the descriptors are in fact there by running a watch with lsof:

ctsvr  9733 mike 1017u  sock     0,7      0t0 3323579 can't identify protocol
ctsvr  9733 mike 1018u  sock     0,7      0t0 3323581 can't identify protocol
...

And sure enough there are about 1000 or so of them. Further more, checking with netstat I can see that there are no hanging TCP states (no WAIT or STOPPED or anything).

If I simply do a single connect/send/recv from the client, I do notice that the socket does stay listed in lsof; so this is not even a load issue.

The server is running on an Ubuntu Linux 64-bit machine.

Any thoughts?

Upvotes: 9

Views: 7414

Answers (3)

HiJack
HiJack

Reputation: 23

Have you ever called perror() after close()? I think the returned string will give you some help;

Upvotes: 1

user1735067
user1735067

Reputation: 141

So using strace (thanks Gearoid), which I have no idea how I ever lived without, I noted I was in fact closing the descriptors.

However. And for the sake of posterity I lay bare my foolish mistake:

Socket::Socket() : impl(new Impl) {
    impl->fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
    ....
}

Socket::ptr_t Socket::accept() {
    auto r = ::accept(impl->fd, NULL, NULL);
    ...
    ptr_t s(new Socket);
    s->impl->fd = r;
    return s;
}

As you can see, my constructor allocated a socket immediately, and then I replaced the descriptor with the one returned by accept - creating a leak. I had refactored the accept code from a standalone Acceptor class into the Socket class without changing this.

Using strace I could easily see socket() being run each time which lead to my light bulb moment.

Thanks all for the help!

Upvotes: 5

phininity
phininity

Reputation: 1683

You are most probably hanging on a recv() or send() command. Consider setting a timeout using setsockopt .

I noticed a similar output on lsof when the socket was closed on the other end but my thread was keeping the socket open hanging on the recv() command waiting for data.

Upvotes: 0

Related Questions