Reputation: 126
I am writing a program in C that needs to read from stdin. I don't want it to allocate more memory than necessary, so I am reading the input in chunks, malloc
ing more memory each time a new chunk is read.
Here is the code (the allocd
variable is just for tracking how much memory it has allocated):
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZ 20
int main(int argc, char *argv[])
{
char *str = malloc(1), *p = NULL;
*str = '\0';
char buf[SIZ];
int bufs = 0;
int allocd = 0;
while (p = fgets(buf, sizeof(buf), stdin))
{
/* grow str */
str = realloc(str, bufs * SIZ + SIZ);
allocd = bufs * SIZ + SIZ;
strcat(str, buf);
bufs++;
if (!p)
break;
}
printf("ALLOC'D: %i", allocd);
free(str);
}
For testing, I have a file called file.txt
, which has 966 characters, as you can see when I use wc
:
$ wc -m file.txt
966 file.txt
The problem is that my program seems to be allocating far more bytes of memory than there are characters in the file, as you can see:
$ ./code <file.txt
ALLOC'D: 1680
Why is this happening, and how can I fix it?
Upvotes: 1
Views: 102
Reputation: 12708
I am writing a program in C that needs to read from stdin. I don't want it to allocate more memory than necessary, so I am reading the input in chunks, mallocing more memory each time a new chunk is read.
Well, your savings intentions are good idea for a programmer, but you are wrong in the savings, because you are not taking into account many things that are hidden to you, but necessary to support an efficient implementation to malloc
.
malloc(1)
will need to use 8bytes + 1
(this last the one you requested) to manage your task. This means that if you make a million such allocations you will have 1 million bytes allocated in your accountability, but you will be wasting 8 millions in malloc
overhead. The number of mallocs you have active counts.malloc(1)
in your code can be avoided. If you read the documentation of realloc()
, you'll see that realloc
doesn't need to have a non-null pointer to operate, if you pass a NULL
pointer to it, it will behave like the initial malloc()
call, but whith the real amount of storage you need.The approach in your code is correct, you use a single active malloc all the time, you have decided to grow in steps of SIZ
(a large SIZ
is good to minimize the overhead of malloc
calls, but you will, in the average, incurr in an overhead of not used memory ---memory allocated, but not filled with characters, of around half of the value of SIZ
, probably more) As the line length is supposed to follow a poison distribution, the best value for SIZ
would be the average line length (or better if you use twice that average, for better performance)
Your code, once corrected, would be:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZ 60 /* assumed an average line length of 30 chars */
int main(int argc, char *argv[])
{
char *str = NULL; /* <<< use null, don't allocate something you don't need */
char buf[SIZ];
/* you don't need to know how many times you repeated the loop */
int allocd = 0; /* allocated capacity */
int strsz = 0; /* filled size */
while (fgets(buf, sizeof(buf), stdin)) /* the p pointer is not necessary */
{
/* grow str */
int read_chars = strlen(buf); /* (1 & 2) see below */
printf("read: [%s]\n", buf);
int pos_to_cp = strsz; /* (3) we need this at the end
*/
strsz += read_chars;
if (strsz >= allocd) { /* need to grow */
printf("growing from %d to %d\n", allocd, allocd + (int)sizeof buf);
allocd += sizeof buf; /* new size */
str = realloc(str, allocd); /* reallocate to allocd */
}
strcpy(str + pos_to_cp, buf); /* (3) see below */
/* (4) see below */
}
printf("ALLOC'D: %i\n", allocd);
printf("string: %s\n", str);
free(str);
}
(1) The read_chars
represents the size of the read string, and it will mark the point where we need to copy the string in buf
.
(2) We don't use a pointer variable here, because as a result of realloc
, the original pointer can change, so we must evaulate the point of copy once we have the new pointer.
(3) We use pointer arithmetic here to find the point to copy the string. This way we end copying always (at the same cost) a simple string of size sizeof buf
, and not appending to a longer and longer string as we iterate on the buffer.
(4) You don't need to check for if (!p)
because if p
is NULL
you never enter in the loop, so the check is useless.
The problem with your program was that your were assuming that the buffer was always filled, so you always needed to grow, which is not true, while fgets
stops at the receiving of one \n
character. So the grow of the buffer is not always needed. I have interspersed some traces in the program, so you can follow it on execution.
Upvotes: 1
Reputation: 211740
You're allocating a new chunk of memory for each line you (attempt to) read regardless of how long the line actually is. I'd argue fgets
is the wrong tool here, that what you want is fread
instead:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define SIZ 20
int main(int argc, char *argv[])
{
char *str = malloc(1);
*str = '\0';
char buf[SIZ];
int allocd = 0;
int p;
// Note: fread() returns size_t number of records read, NOT a char*
while ((p = fread(buf, 1, sizeof(buf), stdin)))
{
str = realloc(str, allocd + p + 1);
// Concatenate the buffer
memcpy(str + allocd, buf, p);
allocd += p;
}
str[allocd + 1] = 0;
printf("ALLOC'D: %i", allocd);
free(str);
}
Upvotes: 3