Using realloc() instead of fixed-length buffers in c?

I have some code that was written to be as conservative as possible with memory use, so it does things like use realloc() for building strings a character at a time instead of a one-time fixed-length buffer as is commonplace. The advantage I see to this is you don't ever over-use memory in environments where that's important and you don't have to worry about the buffer being too small for corner cases and writing some realloc() code anyway to cover that. My question is if this thrashes memory horribly or can cause any other problems, or if perhaps the compiler optimizes this away.

Here is a sample program that reads a text file using no fixed-length buffers and it works fine. Why would you or would you not write code like this? Is it a patently bad idea?

// Compile with: gcc a.c -o a
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[]) {

  void *tmp;
  char **array = NULL; // array of line strings
  size_t len = 0; // length of a line
  int i = 0; // line count

  // Open file
  FILE* f;
  if (!(f = fopen("./test.txt", "r"))) {
    printf("Err: open file\n");
    return 1;
  }

  // Read file into memory
  int chr;
  while ((chr = fgetc(f)) != EOF) {
    // Reserve mem for another line
    if (len == 0) {
      // add 2 -- 1 for null terminator, 1 cuz i starts at 0
      if ((tmp = realloc(array, (i+2)*sizeof(*array))) == NULL) {
        printf("Err: realloc\n");
        return 1;
      }
      array = tmp;
      array[i] = NULL;
      array[i+1] = NULL;
    }

    // Increase mem for next char in the line
    // add 2 -- 1 for null terminator, 1 cuz len starts at 0
    // Q: Why is best practice to omit sizeof knowing that for char it is always 1... all other alloc's need to use sizeof so the inconsistency leads to mistakes
    if ((tmp = realloc(array[i], (len+2)*sizeof(*(array[i])))) == NULL) {
    // if ((tmp = realloc(array[i], len+2)) == NULL) {
      printf("Err: realloc\n");
      return 1;
    }
    array[i] = tmp;
    array[i][len] = (char)chr;
    array[i][len+1] = '\0';
    len++;

    // newline
    if (chr == '\n') {
      len = 0;
      i++;
    }
  }

  if (ferror(f)) {
    printf("Err: read file\n");
    return 1;
  }

  // Verify input
  printf("\nRead:\n-----\n");
  for (i=0; array[i] != NULL; i++) { // array is NULL terminated
    printf("[%d] %s", i+1, array[i]);
    free(array[i]);
  }
  free(array);

  printf("\nDone\n");
  fclose(f);
  return 0;
}

Upvotes: 0

Answers (2)

Chris

Reputation: 36620

As noted in comments, the issue with growing your dynamically allocated buffer one unit at a time is that you have to call realloc every single time, and realloc is potentially very expensive as it may involve copying the data currently allocated to a different place in memory, as opposed to "growing" into adjacent memory.

As such, you'd prefer to grow it as few times as possible. While the ideal growth rate is subject to some discussion, we can safely assume it's not +1, but closer to x2. This is straightforward. If you know how to grow the array by adding 1, you can multiply the size by 2.

What will also help you is taking this logic out of main so that:

Your code is cleaner and more expressive of your intent without worrying about how the work gets done.
It's easier to test individual pieces of the puzzle to make sure you got the logic right.
It's easier to repeat.

We'd probably start out with a struct like:

typedef struct mystr {
    size_t len;
    size_t cap;
    char *str;
} mystr_t;

And then we'd create a bunch of functions to interact with values of this type.

mystr_t *mystr_new();
void mystr_free(mystr_t *s);
int mystr_full(mystr_t *s);
mystr_t *mystr_from_cstr(const char *cs);
mystr_t *mystr_grow(mystr_t *s);
mystr_t *mystr_cpy(mystr_t *dst, const char *src);
mystr_t *mystr_cat(mystr_t *dst, const char *src);
void mystr_debug(mystr_t *s);
mystr_t *mystr_optimize(mystr_t *s);

Now, writing a function to read a line from a file is pretty straightforward:

#define LINE_BUF_SIZE 1024

mystr_t *readline(FILE *fp) {
    if (!fp) return NULL;

    mystr_t *s = mystr_new();
    if (!s) return NULL;

    char buf[LINE_BUF_SIZE];
    while (1) {
        if (!fgets(buf, LINE_BUF_SIZE, fp)) break;

        size_t len = strlen(buf);
        if (buf[len - 1] == '\n') {
            buf[len - 1] = '\0';
            mystr_cat(s, buf);
            break;
        }
        else {
            mystr_cat(s, buf);
        }
    }

    return s;
}

Now, reading a line from standard input in main looks really easy.

int main(void) {
    mystr_t *line = readline(stdin);

    mystr_debug(line);
    mystr_free(line);
}

Upvotes: 0

XeuVon

Reputation: 1

When you call malloc() or realloc() for the first time, he gonna allocated a big chunk of memory. That for future call and not asking for more memory at every single use.

realloc() Before everything look behind the allocated block and see if he can extend the block with some free space. if he can't he malloc() a new memory space for you. That a good strategy, but that don't work that well in your case.

The problem here is you realloc() multiple pointer. Every time he malloc() a new block for you, he need to copy the previous block in the new one. After that he free your first block and return the new one. So when you try to realloc() multiple pointer, your memory look like a mess and you just create some fragmentation.

Other thing, memory is really cheap now day's, you can allocated much more. To compare with c++ container implementation, they alloc the space they need * 2, avoiding linear time to increase performance. You lose a lot of time with all the realloc() call's.

Upvotes: 0

Using realloc() instead of fixed-length buffers in c?

Answers (2)

Related Questions