Chance
Chance

Reputation: 2700

Keepalive time - cannot reduce below one minute in C++

I implement a keepalive time in a C++ application that is writing to a TCP port via the code below. It's not shown, but I actually do a check for a valid return status to verify that setting the options work.

int option = 1;
int keepalive_intvl = 1;
int keepalive_count = 1;
int keepalive_idle = 1;

setsockopt(the_socket, SOL_SOCKET, SO_KEEPALIVE, &option, sizeof (int) );
setsockopt(the_socket, SOL_TCP, TCP_KEEPINTVL, &keepalive_intvl, sizeof(int));
setsockopt(the_socket, SOL_TCP, TCP_KEEPCNT, &keepalive_count, sizeof(int));
setsockopt(the_socket, SOL_TCP, TCP_KEEPIDLE, &keepalive_idle, sizeof(int));

My application is writing to a TCP port, and attempts a write a few times per second.

// write null packet to determine if connection is still good
return ( send( GetDescriptor(),(char*)NULL, 0, 0 ) != -1 );

Whenever I close the other, input connection, it takes one minute for my application to report that the connection is down, based on the test above. If I have a SIGPIPE handler function, it takes one minute for that to be called as well.

Every documentation that I've seen indicates that the keepalive parameters are in seconds, not minutes. But I cannot get the dropped connection to be detected below one minute.

I also tried changing the system variables related to keepalive discussed at tldp.org, but to no avail.

echo 1 > /proc/sys/net/ipv4/tcp_keepalive_time
echo 1 > /proc/sys/net/ipv4/tcp_keepalive_intvl
echo 1 > /proc/sys/net/ipv4/tcp_keepalive_probes

Is this behavior controlled by another system parameter? Are the keepalive parameters actually in minutes, contrary to some documentation? Is there a certain function I should look for in the code that could affect this timeout parameter?

Upvotes: 2

Views: 4247

Answers (3)

Chance
Chance

Reputation: 2700

I am able to change the overall keepalive time via the TCP_LINGER2 value.

Whenever I close the input tcp process, I use netstat -an to get the following lines.

tcp        1      0 127.0.0.1:32962         127.0.0.1:7780          CLOSE_WAIT  
tcp        0      0 127.0.0.1:7780          127.0.0.1:32962         FIN_WAIT2  

I can change this FIN_WAIT2 time two different ways.

On the system level, according to this link, I can change it by modifying a system file as follows:

% cat /proc/sys/net/ipv4/tcp_fin_timeout
60

[To change this to 3 seconds]
# echo "3" > /proc/sys/net/ipv4/tcp_fin_timeout

My output TCP application indicates that the connection is dropped in about four seconds (I imagine 3 for the wait time, 1 for the keepalive idle).

I can also change this on the individual socket level in code. In the file /usr/include/netinet/tcp.h, I see the following

#define TCP_LINGER2  8  /* Life time of orphaned FIN-WAIT-2 state */

So, adding the following in my code,

int wait_time = 3;
setsockopt(the_socket, SOL_TCP, TCP_LINGER2, &wait_time,sizeof(int));

will have the same affect as varying the system parameter.

I do agree with the other answers in that application-level keepalives is really the way to go. And, as mentioned here,

RFC 1122, section 4.2.3.6 indicates that acknowledgements for TCP keepalives without data may not be transmitted reliably by routers; this may cause valid connections to be dropped. Furthermore, TCP/IP stacks are not required to support keepalives at all (and many embedded stacks do not), so this solution may not translate to other platforms.

However, in a non-test environment, I don't have access to the TCP inputs in which I can implement the other side of the application-level keepalives, so TCP keepalives may be my only option.

Upvotes: 1

zzk
zzk

Reputation: 1387

TCP_KEEPCNT (since Linux 2.4) The maximum number of keepalive probes TCP should send before dropping the connection. This option should not be used in code intended to be portable.

Maybe that could be the reason. You could implement your own keep alive in your application, it should be pretty easy. Just start to poke the other end if no application data or keep-alive "heart-beat" comes.

Upvotes: 1

mark
mark

Reputation: 5469

Your best bet is an application-layer keep-alive; that is, send a no-operation (NOP) message every X seconds and expect a reasonably quick NOP-acknowledgement (NOP-ACK). Also, if your remote connection close is "graceful", then your send should unblock nearly immediately. If it's not graceful (e.g. a network element failed), then your application-layer keep-alive will detect the loss at your next time X+(expected response time)...

Upvotes: 1

Related Questions