Vincent
Vincent

Reputation: 3

C - How to determine the amount of bytes for JSON messages

I am working on a Linux-based project consisting of a "core" application, written in C, and a web server, probably written in Python. The core and web server must be able to communicate with each other over TCP/IP. My focus is on the core application, in C.

Because of the different programming languages used for the core and web server, I am looking for a message protocol which is easy to use in both languages. Currently I think JSON is a good candidate. My question, however, is not so much about the message protocol, but about how I would determine the amount of bytes to read from (and maybe send to) the socket, specifically when using a message protocol like JSON, or XML.

As I understand it, whether you use JSON, XML, or some other message protocol, you cannot include the size of the message in the message itself, because in order to parse the message, you would need the entire message and therefore need to know the size of it in advance. Note that by "message" I mean the data formatted according to the used message protocol.

I've been thinking and reading about the solution to this, and have come to the following two possibilities:

  1. Determine the largest possible size of a message, say 500 bytes, and based on that determine the buffer size, say 512 bytes, and add padding to each message so that 512 bytes are sent;
  2. Prepend each message with its size in "plain text". If the size is stored in an Int (4 bytes), then the receiver first reads 4 bytes from the socket and using those 4 bytes, determines how many bytes to read next for the actual message;

Because all of the offered solutions I've read weren't specifically for the use of some message protocol, like JSON, I think it's possible that maybe I am missing out on something.

So, which of the two possibilities I offered is the best, or, am I not aware of some other solution to this problem?

Kind regards.

Upvotes: 0

Views: 474

Answers (1)

sudo
sudo

Reputation: 5804

This is a classic problem encountered with streams, including those of TCP, often called the "message boundary problem." You can search around for more detailed answers than what I can give here.

To determine boundaries, you have some options:

  • Fixed length with padding like you said. Unless you have very small messages, not adviseable.
  • Prepend with size like you said. If you want to get fancy and support large messages without wasting too many bytes, you can use a variable length quantity, where you use a bit to determine whether to read more bytes for the size. @alnitak mentioned a drawback in the comments I neglected, which is that you can't start sending until you know the size.
  • Bound with some byte you don't use anywhere else (JSON and XML are text-only, so '\0' works with ASCII or any UTF). Simple but slower on the receiving end because you have to scan every byte this way.
  • Edit: JSON, XML, and many other formats can also be parsed on-the-fly to determine boundaries (e.g. each { must be closed with } in JSON), but I don't see any advantage to doing this.

If this isn't just a learning experience, you can instead use an existing protocol to do this all for you. HTTP (inefficient) or gRPC (more efficient), for example.

Edits: I originally said something totally wrong about having to include a checksum to handle packet loss in spite of TCP... TCP won't advance until those packets are properly received, so that's not an issue. IDK what I was thinking.

Upvotes: 2

Related Questions