stringson
stringson

Reputation: 21

dynamically allocating a string with unknown size

I have to get names with a known number of names from input as one string each separated by a space, I have to dynamically allocate memory for an array of strings where each string gets a name,

    char** names;
    char ch;
    names = malloc(N*sizeof(char*); /*N is defined*/

    for(i=0; i<N; i++) {

Now I have to allocate for each string without using a defined number:

    i=0, j=0;
    while ((ch=getchar) != '\n') {
         while (ch != ' ') {
              names[i][j++] = ch;
         }
         if (ch == ' ') {
              names[i][j] = '\0';
              i++}}
    if (ch == '\n')
         names[i][j] = '\0';

Upvotes: 3

Views: 8733

Answers (5)

ryyker
ryyker

Reputation: 23208

For a known number of strings, you have allocated the char ** correctly:

char** names;
names = (char**) malloc(N*sizeof(char*));

Note, because the cast is not necessary in C, you could write it like this:

names = malloc(N*sizeof(char*));

For allocating memory as you read the file, for strings of unknown length, use the following approach:

  1. allocate a buffer using [m][c]alloc of a known starting size (calloc is cleaner)
  2. read into the buffer until you run out of space.
  3. use realloc to increase the size of buffer by some increment (double it)
  4. repeat steps 1 through 3 until file is read

Also, when working with buffers of unknown length, and you would like its contents to be pre-set, or zeroed, consider using calloc() over malloc(). It is a cleaner option.

Upvotes: 3

user3629249
user3629249

Reputation: 16540

use readline() or getline() to acquire a pointer to a memory allocation that contains the data.

Then use something like sscanf() or strtok() to extract the individual name strings into members of an array.

Upvotes: 0

K K
K K

Reputation: 1

When you say,

char** names;
char ch;
names = malloc(N*sizeof(char*));

You created a names variable which is double pointer capable of storing address of strings multiple N times.

Ex: if you have 32 strings, then N is 32. So, 32* sizeof(char*) and sizeof char* is 4 bytes Hence, 128 bytes will be allocated

After that you did this,

names[i][j++] = ch;

The above expression is wrong way to use. Because, you are trying to assign char data to address variables.

You need to create sub memories for memory address variables name .

Or you need to assign address of each sub string from main string.

Upvotes: 0

David C. Rankin
David C. Rankin

Reputation: 84531

This is the classic question of how do I handle dynamic allocation and reallocation to store an unknown number of strings. (with a twist to separate each string into individual tokens before saving to the array) It is worth understanding this process in detail as it will serve as the basis for just about any any other circumstance where you are reading an unknown number of values (whether they are structs, floats, characters, etc...).

There are a number of different types of data structures you can employ, lists, trees, etc., but the basic approach is by creating an array of pointer-to-pointer-to-type (with type being char in this case) and then allocating space for, filling with data, and assigning the starting address for the new block of memory to each pointer as your data is read. The short-hand for pointer-to-pointer-to-type is simply double-pointer (e.g. char **array;, which is technically a pointer-to-pointer-to-char or pointer-to-char* if you like)

The general, and efficient, approach to allocating memory for an unknown number of lines is to first allocate a reasonably anticipated number of pointers (1 for each anticipated token). This is much more efficient than calling realloc and reallocating the entire collection for every token you add to your array. Here, you simply keep a counter of the number of tokens added to your array, and when you reach your original allocation limit, you simmply reallocate twice the number of pointers you currenly have. Note, you are free to add any incremental amount you choose. You can simply add a fixed amount each time, or you can use some scaled multiple of the original -- it's up to you. The realloc to twice the current is just one of the standard schemes.

What is "a reasonably anticipated number of pointers?" It's no precise number. You simply want to take an educated guess at the number of tokens you roughtly expect and use that as an initial number for allocating pointers. You wouldn't want to allocate 10,000 pointers if you only expect 100. That would be horribly wasteful. Reallocation will take care of any shortfall, so a rough guess is all that is needed. If you truly have no idea, then allocate some reasonable number, say 64 or 128, etc.. You can simply declare the limit as a constant at the beginning of your code, so it is easily adjusted. e.g.:

#declare MAXPTR 128

or accomplish the same thing using an anonymous enum

enum { MAXPTR = 128 };

When allocating your pointers originally, and as part of your reallocation, you can benefit by setting each pointer to NULL. This is easily accomplished for the original allocation. Simply use calloc instead of malloc. On reallocation, it requires that you set all new pointers allocated to NULL. The benefit it provides is the first NULL acts as a sentinel indicating the point at which your valid pointers stop. As long as you insure you have at least one NULL preserved as a sentinel, you can iterate without the benefit of knowing precise number of pointers filled. e.g.:

size_t i = 0;
while (array[i]) {
    ... do your stuff ...
}

When you are done using the allocated memory, you want to insure you free the memory. While in a simple piece of code, the memory is freed on exit, get in the habit of tracking the memory you allocate and freeing it when it is no longer needed.

As for this particular task, you will want to read a line of unknown number of characters into memory and then tokenize (separate) the string into tokens. getline will read and allocate memory sufficient to hold any size character string. You can do the same thing with any of the other input functions, you just have to code the repeated checks and reallocations yourself. If getline is available (it is in every modern compier), use it. Then it is just a matter of separating the input into tokens with strtok or strsep. You will then want to duplicate the each token to preserve each token in its own block of memory and assign the location to your array of tokens. The following provides a short example.

Included in the example are several helper functions for opening files, allocating and reallocating. All they do is simple error checking which help keep the main body of your code clean and readable. Look over the example and let me know if you have any questions.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAXL  64    /* initial number of pointers  */

/* simple helper/error check functions */
FILE *xfopen (const char *fn, const char *mode);
void *xcalloc (size_t n, size_t s);
void *xrealloc_dp (void *ptr, size_t *n);

int main (int argc, char **argv) {

    char **array = NULL;
    char *line = NULL;
    size_t i, idx = 0, maxl = MAXL, n = 0;
    ssize_t nchr = 0;
    FILE *fp = argc > 1 ? xfopen (argv[1], "r") : stdin;

    array = xcalloc (maxl, sizeof *array);    /* allocate maxl pointers */

    while ((nchr = getline (&line, &n, fp)) != -1)
    {
        while (nchr > 0 && (line[nchr-1] == '\r' || line[nchr-1] == '\n'))
            line[--nchr] = 0; /* strip carriage return or newline   */

        char *p = line;  /* pointer to use with strtok */
        for (p = strtok (line, " \n"); p; p = strtok (NULL, " \n")) {

            array[idx++] = strdup (p);  /* allocate & copy  */

            /* check limit reached  - reallocate */
            if (idx == maxl) array = xrealloc_dp (array, &maxl);
        }
    }
    free (line);  /* free memory allocated by getline */
    if (fp != stdin) fclose (fp);

    for (i = 0; i < idx; i++)  /* print all tokens */
        printf (" array[%2zu] : %s\n", i, array[i]);

    for (i = 0; i < idx; i++)  /* free all memory  */
        free (array[i]);
    free (array);

    return 0;
}

/* fopen with error checking */
FILE *xfopen (const char *fn, const char *mode)
{
    FILE *fp = fopen (fn, mode);

    if (!fp) {
        fprintf (stderr, "xfopen() error: file open failed '%s'.\n", fn);
        // return NULL;
        exit (EXIT_FAILURE);
    }

    return fp;
}

/* simple calloc with error checking */
void *xcalloc (size_t n, size_t s)
{
    void *memptr = calloc (n, s);
    if (memptr == 0) {
        fprintf (stderr, "xcalloc() error: virtual memory exhausted.\n");
        exit (EXIT_FAILURE);
    }

    return memptr;
}

/*  realloc array of pointers ('memptr') to twice current
 *  number of pointer ('*nptrs'). Note: 'nptrs' is a pointer
 *  to the current number so that its updated value is preserved.
 *  no pointer size is required as it is known (simply the size
 *  of a pointer
 */
void *xrealloc_dp (void *ptr, size_t *n)
{
    void **p = ptr;
    void *tmp = realloc (p, 2 * *n * sizeof tmp);
    if (!tmp) {
        fprintf (stderr, "%s() error: virtual memory exhausted.\n", __func__);
        exit (EXIT_FAILURE);
    }
    p = tmp;
    memset (p + *n, 0, *n * sizeof tmp); /* set new pointers NULL */
    *n *= 2;

    return p;
}

Input File

$ cat dat/captnjack.txt
This is a tale
Of Captain Jack Sparrow
A Pirate So Brave
On the Seven Seas.

Output

$ ./bin/getline_strtok <dat/captnjack.txt
 array[ 0] : This
 array[ 1] : is
 array[ 2] : a
 array[ 3] : tale
 array[ 4] : Of
 array[ 5] : Captain
 array[ 6] : Jack
 array[ 7] : Sparrow
 array[ 8] : A
 array[ 9] : Pirate
 array[10] : So
 array[11] : Brave
 array[12] : On
 array[13] : the
 array[14] : Seven
 array[15] : Seas.

Memory/Error Check

In any code your write that dynamically allocates memory, you have 2 responsibilites regarding any block of memory allocated: (1) always preserves a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed. It is imperative that you use a memory error checking program to insure you haven't written beyond/outside your allocated block of memory and to confirm that you have freed all the memory you have allocated. For Linux valgrind is the normal choice. There are so many subtle ways to misuse a block of memory that can cause real problems, there is no excuse not to do it. There are similar memory checkers for every platform. They are all simple to use. Just run your program through it.

$ valgrind ./bin/getline_strtok <dat/captnjack.txt
==26284== Memcheck, a memory error detector
==26284== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==26284== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==26284== Command: ./bin/getline_strtok
==26284==
 array[ 0] : This
 array[ 1] : is
<snip>
 array[14] : Seven
 array[15] : Seas.
==26284==
==26284== HEAP SUMMARY:
==26284==     in use at exit: 0 bytes in 0 blocks
==26284==   total heap usage: 18 allocs, 18 frees, 708 bytes allocated
==26284==
==26284== All heap blocks were freed -- no leaks are possible
==26284==
==26284== For counts of detected and suppressed errors, rerun with: -v
==26284== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 2 from 2)

What you want to confirm each time is "All heap blocks were freed -- no leaks are possible" and "ERROR SUMMARY: 0 errors from 0 contexts".

Upvotes: 5

MikeCAT
MikeCAT

Reputation: 75062

How about growing the buffer gradually, for example, by doubling the size of buffer when the buffer becomes full?

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

char *read_string(void) {
    size_t allocated_size = 2;
    size_t read_size = 0;
    char *buf = malloc(allocated_size); /* allocate initial buffer */
    if (buf == NULL) return NULL;
    for(;;) {
        /* read next character */
        int input = getchar();
        if (input == EOF || isspace(input)) break;
        /* if there isn't enough buffer */
        if (read_size >= allocated_size - 1) {
            /* allocate new buffer */
            char *new_buf = malloc(allocated_size *= 2);
            if (new_buf == NULL) {
                /* failed to allocate */
                free(buf);
                return NULL;
            }
            /* copy data read to new buffer */
            memcpy(new_buf, buf, read_size);
            /* free old buffer */
            free(buf);
            /* assign new buffer */
            buf = new_buf;
        }
        buf[read_size++] = input;
    }
    buf[read_size] = '\0';
    return buf;
}

int main(void) {
    int N = 5;
    int i;

    char** names;
    names = malloc(N*sizeof(char*));
    if(names == NULL) return 1;
    for(i=0; i<N; i++) {
        names[i] = read_string();

    }
    for(i = 0; i < N; i++) {
        puts(names[i] ? names[i] : "NULL");
        free(names[i]);
    }
    free(names);
    return 0;
}

Note: They say you shouldn't cast the result of malloc() in C.

Upvotes: 3

Related Questions