Reputation: 305
I have the following code that can dynamically allocate a single sentece:
int size=1;
char * text = (char*) malloc(size * sizeof(char));
size = (int)sizeof(text);
fgets(text, si, stdin);
//remove new line()
printf ("Sentence = <%s>\n", text);
I would like to be able to allocate and store multiple lines ending in '\n' to further work with (format), I don't know how many lines I will be allocating or how long they will be. The input of lines ends with EOF. It doesn't have to be with fgets. For example:
Any ideas?
Upvotes: 1
Views: 2883
Reputation: 84579
This is the classic question of how do I handle dynamic allocation and reallocation to store an unknown number of strings. It is worth understanding this process in detail as it will serve as the basis for just about any any other circumstance where you are reading an unknown number of values (whether they are structs, floats, characters, etc...).
There are a number of different types of data structures you can employ, lists, trees, etc., but the basic way (as you call it "2D char array") is handled by creating an array of pointer-to-pointer-to-type (with type being char
in this case) and then allocating space for, filling with data, and assigning the starting address for the new block of memory to each pointer as your data is read. The short-hand for pointer-to-pointer-to-type is simply double-pointer (e.g. char **array;
, which is technically a pointer-to-pointer-to-char or pointer-to-char* if you like)
The general, and efficient, approach to allocating memory for an unknown number of lines is to first allocate a reasonably anticipated number of pointers (1 for each anticipated line). This is much more efficient than calling realloc
and reallocating the entire collection for every line you read. Here, you simply keep a counter of the number of lines read, and when you reach your original allocation limit, you simmply reallocate twice the number of pointers you currenly have. Note, you are free to add any incremental amount you choose. You can simply add a fixed amount each time, or you can use some scaled multiple of the original -- it's up to you. The realloc to twice current is just one of the standard schemes.
When allocating your pointers originally, and as part of your reallocation, you can benefit by setting each pointer to NULL
. This is easily accomplished for the original allocation. Simply use calloc
instead of malloc
. On reallocation, it requires that you set all new pointers allocated to NULL
.
Why? It isn't mandatory, but doing so allows you to iterate over your array of pointers without knowing the number of lines. How does that work? Example, suppose you initialized 100 pointers to NULL
and have assigned a number of lines to each pointer as you go. To iterate over the collection you can simply do:
size_t i = 0;
while (array[i]) {
... do your stuff ...
}
Only pointers you have assigned something to will have a value. So the loop will only interate over the pointers that have a value, stopping when the first NULL
pointer is encountered. (the first NULL
simply serves as your sentinel value telling you when to stop). This also provides the ability to pass a pointer to your collection to any function without also passing the number of lines/values contained. (note: there is no reason not to pass the size of the collection, but there are circumstances where this is a benefit)
The example below uses the traditional approach of iterating over a fixed number of lines to print the lines, and then free the memory allocated, but there is no reason you couldn't simply iterate over the valid pointers in both cases to accomplish the same thing.
The same holds true when allocating storage for your lines as well. If you use calloc
instead of malloc
, you initialize all values to 0
(nul
). The storage for all your strings is then guaranteed to be nul-terminated by virtue of the initialization. The same applies to allocating for numeric arrays as well. By initializing all values to 0
you prevent any possibility of an accidental attempted read from an uninitialized value (undefined behavior). While generally not a problem when you sequentially fill/read from an array, when randomized storage and retrieval routines are used, this can be a real problem.
When allocating memory, you must validate that each call succeeded (e.g. for malloc
, calloc
, realloc
, and for other function calls that allocate for you like strdup
). It is just a simple check, but it is something to get in the habit of doing every time or risk attempted reads and writes from/to unallocated memory. In the example below, simple functions are used as a wrapper for calloc
and realloc
that provide the necessary checks. While there is no requirement to use similar helper-functions, they help keep the main body of your code free from repetitive memory checks, etc. that can make the code more difficult to read.
A final note on realloc
. Always use a temporary variable to hold the return of realloc
. Why? On success realloc
returns a pointer to the newly allocated block of memory. On failure it returns NULL
. If you fail to use a temporary pointer and the request fails, you have lost access to (lost the address of) all of your previously stored values. Once you validate realloc
succeeded, simply assign your temporary pointer to the original.
The example below will read all lines from the filename given as the first argument to the program (or stdin
by default). It uses fgets
to read each line. Each line is tested to insure that all characters in the line were successfully read. If the line was too long to fit into the space provided, a simple warning is given, and the remainder is read into the following line of storage (you can realloc
and concatenate here as well). All lines are stored in array
. There are MAXL
(64) pointers originally allocated and each line can hold MAXC
(256) characters. You can change either to meet your needs or set MAXL
to 1
to force the reallocation of lines to begin there. The lines are simply printed to the terminal, and then all memory is freed before the program exits.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAXC 256 /* max chars per-line */
#define MAXL 64 /* initial num lines */
void *xcalloc (size_t n, size_t s);
void *xrealloc_dp (void *ptr, size_t *n);
int main (int argc, char **argv) {
char **array = NULL;
char buf[MAXC] = {0};
size_t i, idx = 0, maxl = MAXL;
FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;
if (!fp) {
fprintf (stderr, "error: file open failed '%s'.\n", argv[1]);
return 1;
}
array = xcalloc (maxl, sizeof *array); /* allocate maxl pointers */
while (fgets (buf, MAXC, fp)) /* read all lines from fp into array */
{
size_t len = strlen (buf);
/* validate complete line read */
if (len + 1 == MAXC && buf[len - 1] != '\n')
fprintf (stderr, "warning: line[%zu] exceeded '%d' chars.\n",
idx, MAXC);
/* strip trailing '\r', '\n' */
while (len && (buf[len-1] == '\n' || buf[len-1] == '\r'))
buf[--len] = 0;
/* allocate & copy buf to array[idx], nul-terminate
* note: this can all be done with array[idx++] = strdup (buf);
*/
array[idx] = xcalloc (len + 1, sizeof **array);
strncpy (array[idx], buf, len);
array[idx++][len] = 0;
/* realloc as required (note: maxl passed as pointer) */
if (idx == maxl) array = xrealloc_dp (array, &maxl);
}
if (fp != stdin) fclose (fp);
printf ("\n lines read from '%s'\n\n", argc > 1 ? argv[1] : "stdin");
for (i = 0; i < idx; i++)
printf (" line[%3zu] %s\n", i, array[i]);
for (i = 0; i < idx; i++)
free (array[i]); /* free each line */
free (array); /* free pointers */
return 0;
}
/* simple calloc with error checking */
void *xcalloc (size_t n, size_t s)
{
void *memptr = calloc (n, s);
if (memptr == 0) {
fprintf (stderr, "xcalloc() error: virtual memory exhausted.\n");
exit (EXIT_FAILURE);
}
return memptr;
}
/* realloc array of pointers ('memptr') to twice current
* number of pointer ('*nptrs'). Note: 'nptrs' is a pointer
* to the current number so that its updated value is preserved.
* no pointer size is required as it is known (simply the size
* of a pointer
*/
void *xrealloc_dp (void *ptr, size_t *n)
{
void **p = ptr;
void *tmp = realloc (p, 2 * *n * sizeof tmp);
if (!tmp) {
fprintf (stderr, "xrealloc_dp() error: virtual memory exhausted.\n");
exit (EXIT_FAILURE);
}
p = tmp;
memset (p + *n, 0, *n * sizeof tmp); /* set new pointers NULL */
*n *= 2;
return p;
}
Compile
gcc -Wall -Wextra -O3 -o bin/fgets_lines_dyn fgets_lines_dyn.c
Use/Output
$ ./bin/fgets_lines_dyn dat/captnjack.txt
lines read from 'dat/captnjack.txt'
line[ 0] This is a tale
line[ 1] Of Captain Jack Sparrow
line[ 2] A Pirate So Brave
line[ 3] On the Seven Seas.
Memory Leak/Error Check
In any code your write that dynamically allocates memory, you have 2 responsibilites regarding any block of memory allocated: (1) always preserves a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed. It is imperative that you use a memory error checking program to insure you haven't written beyond/outside your allocated block of memory and to confirm that you have freed all the memory you have allocated. For Linux valgrind
is the normal choice. There are so many subtle ways to misuse a block of memory that can cause real problems, there is no excuse not to do it. There are similar memory checkers for every platform. They are all simple to use. Just run your program through it.
$ valgrind ./bin/fgets_lines_dyn dat/captnjack.txt
==22770== Memcheck, a memory error detector
==22770== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==22770== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==22770== Command: ./bin/fgets_lines_dyn dat/captnjack.txt
==22770==
lines read from 'dat/captnjack.txt'
line[ 0] This is a tale
line[ 1] Of Captain Jack Sparrow
line[ 2] A Pirate So Brave
line[ 3] On the Seven Seas.
==22770==
==22770== HEAP SUMMARY:
==22770== in use at exit: 0 bytes in 0 blocks
==22770== total heap usage: 6 allocs, 6 frees, 1,156 bytes allocated
==22770==
==22770== All heap blocks were freed -- no leaks are possible
==22770==
==22770== For counts of detected and suppressed errors, rerun with: -v
==22770== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)
Simply look for All heap blocks were freed -- no leaks are possible and ERROR SUMMARY: 0 errors from 0 contexts. If you don't have both, go back and figure out why.
This ended up much longer than anticipated, but this is something worth understanding. Let me know if there is anything else you have questions on.
Upvotes: 5