Is this the correct way of splitting a large file?

Question

I need to split a large image or text file into multiple chunks of 10 bytes. These chunks will be sent via UDP to a server. The problem is: 1. Im unsure about this code. Is this a good way of splitting files? 2. The programs memory usage is pretty high. 400 KB only for this function.

int nChunks = 0;
char chunk[10];
FILE *fileToRead;
fileToRead = fopen(DEFAULT_FILENAME, "rb");

while (fgets(chunk, sizeof(chunk), fileToRead)) {
    char *data = malloc(sizeof(chunk));
    strcpy(data, chunk);

    packet *packet = malloc(sizeof(packet));
    packet->header = malloc(sizeof(packetHeader));
    packet->header->acked = 0;
    packet->header->id = ++nChunks;
    packet->header->last = 0;
    packet->header->timestamp = 0;
    packet->header->windowSize = 10;
    packet->data = data;

    list_append(packages, packet);
}


typedef struct packetHeader{
    ...
}packetHeader;

typedef struct packet{
    packetHeader *header;
    void *data;
}packet;

Sergey Kalinichenko · Accepted Answer

Is this a good way of splitting files?

When the file consists of ten-character chunks of text, this is OK; since in your case the file is an image, not a text, you should use fread instead of fgets.

In addition, you should be passing sizeof(chunk), not sizeof(chunk)+1 as the size. Otherwise, fgets writes an extra byte past the end of the memory space allocated by your program.

Finally, you shouldn't be using strcpy to copy general data: use memcpy instead. In fact, you can avoid copying altogether if you manage buffers inside the loop, like this:

#define CHUNK_SIZE 10
...
// The declaration of chunk is no longer necessary
// char chunk[10];
...
for (;;) {
    char *data = malloc(CHUNK_SIZE);
    size_t len = fread(data, CHUNK_SIZE, 1, fileToRead);
    if (len == 0) {
        free(data);
        break;
    }
    // At this point, the ownership of the "data" can be transferred
    // to packet->data without making an additional copy.
    ...
}

The programs memory usage is pretty high. 400 KB only for this function.

This is because reading a file goes much faster than sending it via UDP. That's why the program manages to read the whole file into memory before the first few UDP packets get sent. You will be better off reading and sending in the same loop, rater than queuing up a whole list of packets upfront. When you read as you send, the slower part of the process controls the overall progress. This would let your program use a tiny portion of the memory space that it uses now, because there would be no buffering.

Is this the correct way of splitting a large file?

Answers (1)

Related Questions