Reputation: 413

Reading wrong data from TCP socket

I'm trying to send data blockwise over a TCP socket. The server code does the following:

#define CHECK(n) if((r=n) <= 0) { perror("Socket error\n"); exit(-1); }
int r;

//send the number of blocks
CHECK(write(sockfd, &(storage->length), 8)); //p->length is uint64_t

for(p=storage->first; p!=NULL; p=p->next) {
  //send the size of this block
  CHECK(write(sockfd, &(p->blocksize), 8)); //p->blocksize is uint64_t

  //send data
  CHECK(write(sockfd, &(p->data), p->blocksize));
}

On the client side, I read the size and then the data (same CHECK makro):

CHECK(read(sockfd, &block_count, 8));
for(i=0; i<block_count; i++) {
  uint64_t block_size;
  CHECK(read(sockfd, &block_size, 8));

  uint64_t read_in=0;
  while(read_in < block_size) {
    r = read(sockfd, data+read_in, block_size-read_in); //assume data was previously allocated as char*
    read_in += r;
  }
}

This works perfectly fine as long as both client and server run on the same machine, but as soon as I try this over the network, it fails at some point. In particular, the first 300-400 blocks (à ~587 bytes) or so work fine, but then I get an incorrect block_size reading:

received block #372 size : 586
read_in: 586 of 586
received block #373 size : 2526107515908

And then it crashes, obviously. I was under the impression that the TCP protocol ensures no data is lost and everything is received in correct order, but then how is this possible and what's my mistake here, considering that it already works locally?

Upvotes: 1

Answers (4)

Arun Taylor

Reputation: 1572

The reason why it works on the same machine is that the block_size and block_count are sent as binary values and when they are received and interpreted by the client, they have same values.

However, if two machines communicating have different byte order for representing integers, e.g. x86 versus SPARC, or sizeof(int) is different, e.g. 64 bit versus 32 bit, then the code will not work correctly.

You need to verify that sizeof(int) and byte order of both machines is identical. On the server side, print out sizeof(int) and values of storage->length and p->blocksize. On the client side print out sizeof(int) and values of block_count and block_size.

When it doesn't work correctly, I think you will find them that they are not the same. If this is true, then the contents of data is also going to be misinterpreted if it contains any binary data.

Upvotes: 1

Jamey Sharp

Reputation: 8511

Perhaps the read calls are returning without reading the full 8 bytes. I'd check what length they report they've read.

You might also find valgrind or strace informative for better understanding why your code is behaving this way. If you're getting short reads, strace will tell you what the syscalls returned, and valgrind will tell you that you're reading uninitialized bytes in your length variables.

Upvotes: 1

slebetman

Reputation: 114014

I was under the impression that the TCP protocol ensures no data is lost and everything is received in correct order

Yes, but that's all that TCP guarantees. It does not guarantee that the data is sent and received in a single packet. You need to gather the data and piece them together in a buffer until you get the block size you want before copying the data out.

Upvotes: 1

caf

Reputation: 239321

There's no guarantee that when you read block_count and block_size that you will read all 8 bytes in one go.

Upvotes: 4

Reading wrong data from TCP socket

Answers (4)

Related Questions