dbeer
dbeer

Reputation: 7203

Socket Stuck in CLOSE_WAIT

I am getting sockets stuck in close_wait when two of my daemons speak to each other. After having read different questions and blog entries on the subject, I have verified that I am closing the socket from both sides (originator and receiver).

The model goes as follows:

Sender: establish connection, send data, wait for confirmation, close connection

Receiver: receive connection, read data, send confirmation, close connection

Can anyone tell me what I'm doing wrong? Note: I am using close() to close the connections right now. I have tried using shutdown as well and it hasn't changed things. Any hints would be greatly appreciated.

EDIT: Shortly after closing the socket, the receiving daemon forks. I have tried passing the file descriptor to the function that forks and explicitly closing it again in the child process, but this did not fix my problem. Is there any other way that forking could affect this process? Note that the sending daemon does not fork.

Upvotes: 0

Views: 6518

Answers (4)

mdk
mdk

Reputation: 6523

Actually these are quite common problems witnessed in multi-threaded server applications There are two things you could do to resolve this problem:

  1. Use FD_CLOSEXEC on the sockets.
  2. Use setsockopt and set tcp_keepalive on the sockets.

The code for implementation of both of the above solutions can be a little different on *NIX and Microsoft. The difference is only due to semantic differences.

I would recommend implementing both of the above measures.

However if you cannot modify the code then you could use libkeepalive

Upvotes: 0

dbeer
dbeer

Reputation: 7203

After looking in wireshark, I saw that the final FIN_ACK said:

"[TCP ACKed lost segment] [TCP previous segment lost] ..."

It turns out that my problem was caused by having both daemons running on the same box (something we had added for testing). After trying again on multiple boxes, we no longer get this problem.

Upvotes: 1

Neowizard
Neowizard

Reputation: 3017

In my (short) experience, it's very possible that you're closing the wrong fd, or even not reaching the "close" statement at all. I stumbled upon the later one and the first clue was that my application became a zombie instead of closing (specifically a simple printf right before the close statement made it all go to hell).

Might be worth your time to check the task manager/jobs/system monitor/< some process view name relevant to your OS>.

Upvotes: 0

Abhishek Chandel
Abhishek Chandel

Reputation: 1354

when you have an application which has opened a socket and after doing some send receive it accepts a FIN from its peer, from that states onwards it goes to CLOSE_WAIT state. It can remain in that state forever until you explicitly call close(). Hope you are actually passing the right FD in close().

Upvotes: 0

Related Questions