Reputation: 11994

Netty failing to read bytes from server when write fails due to socket close by server

Netty Version: 4.0.10.Final

I've written a client and server using Netty. Here is what client and server do.

Server:

Wait for connection from client
Receive messages from client
If a message is bad, write error message (6 bytes), flush it, close the socket and do not read any unread messages in the socket. Otherwise continue reading messages. Do nothing with good messages.

Client:

Connect to server.
After writing N good messages, write one bad message and continue writing M good messages. This process happens in a separate thread. This thread is started after the channel is active.
If there is any response from server, log it and close the socket. (Note that server responds only when there is an error)

I've straced both client and server. I've found that server is closing connection after writing the error message. Client began seeing broken pipe errors when writing good messages after the bad message. This is because server detected bad message and responded with error message and closed socket. connection is closed only after the write operation is complete using a listener. Client is not reading error message from server always. Earlier step (2) in client is performed in I/O thread. This caused the % of error messages received over K number of experiments to be really low (<10%). After moving step (2) to separate thread, % went to (70%). In any case it is not accurate. Does netty trigger channel read if the write fails due to broken pipe?

Update 1: I'm clarifying and answering any questions asked here, so everybody can find the asked questions/clarifications at one place. "You're writing a bad message that will cause a reset, followed by good messages that you already know won't get through, and trying to read a response that may have been thrown away. It doesn't make any sense to me whatsoever" - from EJP

-- In real world the server could treat something as bad for whatever reason client can't know in advance. For simplification, I said client intentionally sends a bad message that causes reset from server. I would like to send all good messages even if there are bad messages in the total messages.

What I'm doing is similar to the protocol implemented by Apple Push Notification Service.

Upvotes: 0

Answers (2)

johnstlr

Reputation: 1431

The APN protocol appears to be quite awkward because it does not acknowledge successful receipt of a notification. Instead it just tells you which notifications it has successfully received when it encounters an error. The protocol is working on the assumption that you will generally send well formed notifications.

I would suggest that you need some sort of expiring cache (a LinkedHashMap might work here) and you need to use the opaque identifier field in the notification as a globally unique, ordered value. A sequence number will work (but you'll need to persist if your client can be restarted).

Every time you generate an APN

set its identifier to the next sequence number
send it
place it in the LinkedHashMap with a string key of sequence number concatenated with the current time (eg String key = sequenceNumber + "-" + System.currentTimeMillis() )

If you receive an error you need to reopen the connection and resend all the APNs in the map with a sequence number higher than the identifier reported in the error. This is relatively easy. Just iterate through the map removing any APN with a sequence number lower than that reported. Then resend the remain APNs in order, replacing them in the map with the current time (ie you remove an APN when you resend it, then re-insert into the map with the new current time).

You'll need to periodically purge the map of old entries. You need to determine what is a reasonable length of time based on how long it takes the APN service to return an error if you send a malformed APN. I suspect it'll be a matter of seconds (if not much quicker). If, for example, you're sending 10 APNs / second, and you know that the APN server will definitely respond within 30 seconds, a 30 second expiry time, purging every second, might be appropriate. Just iterate along the map removing any elements which has a time section of it's key that is less than System.currentTimeMillis() - 30000 (for 30 second expiry time). You'll need to synchronize threads appropriately.

I would catch any IOExceptions caused by writing and place the APN you were attempting to write in the map and resend.

What you cannot cope with is a genuine network error whereby you do not know if the APN service received the notification (or a bunch of notifications). You'll have to make a decision based on what your service is as to whether you resend the affected APNs immediately, or after some time period, or not at all. If you send after a time period you'll want to give them new sequence numbers at the point you send them. This will allow you to send new APNs in the meantime.

Upvotes: 0

user207421

Reputation: 310913

If a message is bad, write error message (6 bytes), flush it, close the socket and do not read any unread messages in the socket. Otherwise continue reading messages.

That will cause a connection reset, which will be seen by the client as a broken pipe in Unix, Linux etc.

After writing N good messages, write one bad message and continue writing M good messages.

That will encounter the broken pipe error just mentioned.

This process happens in a separate thread.

Why? The whole point of NIO and therefore Netty is that you don't need extra threads.

I've found that server is closing connection after writing the error message.

Well that's what you said it does, so it does it.

Client began seeing broken pipe errors when writing good messages after the bad message.

As I said.

This is because server detected bad message and responded with error message and closed socket.

Correct.

Client is not reading error message from server always.

Due to the connection reset. The delivery of pending data ceases after a reset.

Does netty trigger channel read if the write fails due to broken pipe?

No, it triggers read when data or EOS arrives

However your bizarre system design/protocol is making that unpredictable if not impossible. You're writing a bad message that will cause a reset, followed by good messages that you already know won't get through, and trying to read a response that may have been thrown away. It doesn't make any sense to me whatsoever. What are you trying to prove here?

Try a request-response protocol like everybody else.

Upvotes: 1

Netty failing to read bytes from server when write fails due to socket close by server

Answers (2)

Related Questions