black
black

Reputation: 357

Sockets (in Java). Splitting byte chunks

Lets say you have a continious binary stream of data. And each of data's pieces should be somehow split. What is the best way to do it?

Socket.read(byte[] arr) doesnt guarantee that you will recieve exactly the same ammount of bytes as you sent using Socket.write(byte[] arr) arr may be split (out of 10 bytes you first read 8 and then 2) or spliced.

One of the ways of solving this is specifying incoming byte array's size first. Read exactly 4 bytes, convert them into an Integer x, then read x bytes. But this only works in TCP and may completely mess everything up if just one time you will send wrong byte array size

Another one I can think of is prefixing chunks of data with a pseudo random byte sequences. Initialize Random on client and server with the seed and use its random.nextBytes(byte[] arr) for prefixes. The downside of it is that in order to make sure that there is a very little possibility of having a random sequence in an actual data chunk you have to make it pretty long. That will add up a lot of useless traffic. And again it is not a way out in UDP sockets.

So what are other good ways of doing this and are there any simple libraries which would allow me to simply do conn.sendDataChunk(byte[] arr) ?

Upvotes: 0

Views: 1038

Answers (2)

mattm
mattm

Reputation: 5949

Regarding UDP sockets: reading a UDP socket is not the same as reading a TCP socket. You either get the UDP packet or you do not. From UDP:

Datagrams – Packets are sent individually and are checked for integrity only if they arrive. Packets have definite boundaries which are honored upon receipt, meaning a read operation at the receiver socket will yield an entire message as it was originally sent.

So for UDP, you do not need to worry about reading an incorrect number of bytes. But you do have to worry about what happens if some data does not arrive, or if it arrives in a different order than it was sent.

Upvotes: 0

John Kugelman
John Kugelman

Reputation: 361595

One of the ways of solving this is specifying incoming byte array's size first. Read exactly 4 bytes, convert them into an Integer x, then read x bytes.

Yep, that's exactly what you should do. In other words, you're adding a message header before each message. This is a practical necessity when you want to layer a message-based network protocol atop of a stream-based one that has no concept of message boundaries. (TCP purposefully obscures the IP packet boundaries.)

You could also use this as an opportunity to add other fields to the message header, such as a message ID number to help you distinguish between different message types.

But this only works in TCP and may completely mess everything up if just one time you will send wrong byte array size.

This is true. So don't send a malformed header!

In all seriousness, a header with a length field is standard practice. It's a good idea to add sanity checks on the length: make sure it's not too large (or negative) so you don't end up allocating 2GB of memory to read the next message.

Also, don't assume you can read the whole header in with a single read() either. It may take multiple reads to get the whole header.

Upvotes: 1

Related Questions