user12050586
user12050586

Reputation:

Join strings in C with a separator

I am trying to write a function to do a join of strings with a separator. Here is what I have so far:

int main(void) {

    char * strings[] = {"A", "B", NULL};
    char ** copied_strings = malloc(sizeof strings);

    // Join strings with a separator
    char * separator = "XXX";
    size_t num_array_elements = (sizeof strings / sizeof * strings) - 1; // because last element is NULL
    size_t len_separator = strlen(separator);
    size_t len_strings = 0;
    for (int i=0; strings[i] != NULL ;i++) len_strings += strlen(strings[i]);
    size_t malloc_buffer_size = len_strings + (len_separator * (num_array_elements -1)) + 1;
    printf("Separator: %s | Len Array: %lu | Len Strings: %lu | Malloc Buffer Size: %lu\n", separator, num_array_elements, len_strings, malloc_buffer_size);
    char * joined_string_buffer = malloc(malloc_buffer_size);
    join_strings(joined_string_buffer, copied_strings, separator);

}

void join_strings(char * joined_string_buffer, char ** src, char * separator) {

    size_t sep_len = strlen(separator);

    while (*src) {
        size_t string_len = strlen(*src);
        for (int i=0; i<string_len; i++)
            *joined_string_buffer++ = (*src)[i];
        for (int i=0; i<sep_len; i++)
            *joined_string_buffer++ = separator[i];
        *src++;
    }

    *joined_string_buffer = '\0';

}

However, it seems like I'm not properly copying the characters to the *joined_string_buffer. How would I properly join the strings here?

Upvotes: 0

Views: 3006

Answers (4)

It is generally not a good idea to do the byte-fiddling yourself. Your code is testimony of that. Here are the most pronounced issues that I see:

  • join_strings() writes the result string into a buffer that's allocated by its caller. How does the caller know how much space will be needed? It doesn't know. It'll just make a guess, and if that guess is too small, all hell breaks loose.

  • The strlen() calls need to iterate their entire argument string to find the terminating null byte. Then your code iterates the same byte sequence again, reloading the data into the CPU. One of these passes over the input string could be optimized away, but only by you, I don't think there are sufficiently smart compilers out there yet to do this optimization.

  • Your join_strings() implementation is needlessly complex. Complexity makes code hard to think about, and hard to think about code attracts bugs.

  • The copying of the input strings in main() is pointless. It only adds to the complexity, makes the code hard to reason about, and attracts bugs. You are better of without it.

Ok, so how could you do better? Well, by using the appropriate standard library functions. In this case, the best solution is to use a stream:

void write_strings(FILE* stream, int count, char** src, char* separator) {
    for(int i = 0; i < count; i++) {
        fprintf(stream, "%s%s", src[i], separator);
    }
}

Three lines, one loop, one simple fprintf() call.

This function can be used to write some stuff to stderr for instance: write_strings(stderr, 2, (char*){"Streams", "Rock"}, ". ") will print the message "Streams. Rock. " to the error stream.

But how do you get the result into a string? Simple. Use open_memstream():

char* join_strings(int count, char** src, char* separator) {
    char* result = NULL;
    size_t length = 0;
    FILE* stream = open_memstream(&result, &length);
    write_strings(stream, count, src, separator);
    fclose(stream);
    return result;
}

There is not even a loop in this function, it's basically just a call-through to write_strings().

The returned string is automatically allocated by open_memstream(), which means that there is zero danger of overrunning a buffer. That alone should be reason enough to use. It also makes this join_strings() version dead simple to use: The caller does not need to decide just how much space to allocate, it simply puts in the strings, and it gets out the joined string. Exactly what it needs.

You may notice that I have changed the function's signature to include a count argument. This change is not required, but my experience tells me that it's a much better choice than a NULL terminated list: You can easily forget to add a NULL terminator, but you cannot forget providing a required function parameter. And having the count up-front allows many algorithms to be implemented in a much more straight-forward way. So I default to always associating arrays with a corresponding size argument, even if I could avoid it if I wanted.

Anyway, how does all this change the way the function is used? Not really all that much. The most important change is, that we don't need to calculate a size for the resulting buffer in main(). Since we can also strip the copy of the input strings, the updated main() boils down to this short, simple piece of code:

int main(void) {
    char * strings[] = {"A", "B"};    //no terminator necessary
    size_t num_array_elements = sizeof strings / sizeof * strings;
    char * separator = "XXX";

    printf("Separator: %s | Len Array: %lu\n", separator, num_array_elements);
    char * joined_string_buffer = join_strings(num_array_elements, strings, separator);

    //Added by me: Print the result to stdout
    printf("Resulting string: \"%s\" (%d characters)\n", joined_string_buffer, strlen(joined_string_buffer));

    //Cleanup: Free the buffer allocated by `open_memstream()`
    free(joined_string_buffer);
}

Note that there is not a single size calculation in the entire code (all three functions), nor is there a single explicit malloc() call. There is one free(), but that's all about the required memory management, the heavy lifting is performed by the standard library functions.

Upvotes: 1

David C. Rankin
David C. Rankin

Reputation: 84579

Presuming that if you are adding a separator, you only want to place the separator between words in your strings array, you will need to add conditional logic to your join_stings function to only add the separator before the second (and subsequent) strings in your strings array.

There are a number of ways to approach the problem, but given that your strings array has a sentinel NULL, you could do something like:

char *joinstr (const char **s, const char *sep)
{
    char *joined = NULL;                /* pointer to joined string w/sep */
    size_t lensep = strlen (sep),       /* length of separator */
        sz = 0;                         /* current stored size */
    int first = 1;                      /* flag whether first term */

    while (*s) {                        /* for each string in s */
        size_t len = strlen (*s);
        /* allocate/reallocate joined */
        void *tmp = realloc (joined, sz + len + (first ? 0 : lensep) + 1);
        if (!tmp) {                     /* validate allocation */
            perror ("realloc-tmp");     /* handle error (adjust as req'd) */
            exit (EXIT_FAILURE);
        }
        joined = tmp;                   /* assign allocated block to joined */
        if (!first) {                   /* if not first string */
            strcpy (joined + sz, sep);  /* copy separator */
            sz += lensep;               /* update stored size */
        }
        strcpy (joined + sz, *s++);     /* copy string to joined */
        first = 0;                      /* unset first flag */
        sz += len;                      /* update stored size */
    }

    return joined;      /* return joined string */
}

Adding a short main() to test the joinstr function above, you could do:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
...    
int main (void) {

    const char *strings[] = {"A", "B", NULL},
        *sep = "XXX";
    char *joined = joinstr (strings, sep);  /* join strings */

    printf ("%s\n", joined);    /* output joined string */
    free (joined);              /* free memory */
}

Example Use/Output

$ ./bin/joinwsep
AXXXB

Memory Use/Error Check

In any code you write that dynamically allocates memory, you have 2 responsibilities regarding any block of memory allocated: (1) always preserve a pointer to the starting address for the block of memory so, (2) it can be freed when it is no longer needed.

It is imperative that you use a memory error checking program to insure you do not attempt to access memory or write beyond/outside the bounds of your allocated block, attempt to read or base a conditional jump on an uninitialized value, and finally, to confirm that you free all the memory you have allocated.

For Linux valgrind is the normal choice. There are similar memory checkers for every platform. They are all simple to use, just run your program through it.

$ valgrind ./bin/joinwsep
==17127== Memcheck, a memory error detector
==17127== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==17127== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==17127== Command: ./bin/joinwsep
==17127==
AXXXB
==17127==
==17127== HEAP SUMMARY:
==17127==     in use at exit: 0 bytes in 0 blocks
==17127==   total heap usage: 2 allocs, 2 frees, 8 bytes allocated
==17127==
==17127== All heap blocks were freed -- no leaks are possible
==17127==
==17127== For counts of detected and suppressed errors, rerun with: -v
==17127== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

Always confirm that you have freed all memory you have allocated and that there are no memory errors.

Look things over and let me know if you have further questions.

Upvotes: 1

Jonathan Leffler
Jonathan Leffler

Reputation: 754480

There are many problems in the code, but they're mostly details. Unfortunately, in programming, even the details have to be right. This code fixes most of the problems — most of which were identified in the comments.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static void join_strings(char *joined_string_buffer, char **src, char *separator);

int main(void)
{
    char *strings[] = { "A", "B", NULL };
    char *separator = "XXX";
    size_t num_array_elements = (sizeof strings / sizeof *strings) - 1;  // because last element is NULL
    size_t len_separator = strlen(separator);

    size_t len_strings = 0;
    for (int i = 0; strings[i] != NULL; i++)
        len_strings += strlen(strings[i]);

    size_t malloc_buffer_size = len_strings + (len_separator * (num_array_elements - 1)) + 1;
    printf("Separator: '%s' | Len Array: %zu | Len Strings: %zu | Malloc Buffer Size: %zu\n",
           separator, num_array_elements, len_strings, malloc_buffer_size);
    char *joined_string_buffer = malloc(malloc_buffer_size);
    if (joined_string_buffer == 0)
    {
        fprintf(stderr, "failed to allocate %zu bytes of memory\n", malloc_buffer_size);
        exit(EXIT_FAILURE);
    }

    join_strings(joined_string_buffer, strings, separator);

    printf("[[%s]]\n", joined_string_buffer);
    free(joined_string_buffer);
    return 0;
}

static void join_strings(char *joined_string_buffer, char **src, char *separator)
{
    size_t sep_len = strlen(separator);

    while (*src)
    {
        size_t string_len = strlen(*src);
        for (size_t i = 0; i < string_len; i++)
            *joined_string_buffer++ = (*src)[i];
        for (size_t i = 0; i < sep_len; i++)
            *joined_string_buffer++ = separator[i];
        src++;
    }

    *joined_string_buffer = '\0';
}

Example output

Separator: 'XXX' | Len Array: 2 | Len Strings: 2 | Malloc Buffer Size: 6
[[AXXXBXXX]]

Note that the 'separator' is more strictly a 'terminator' — it appears after the last item in the list as well as in between.

The code shown is a more or less direct fix-up of the code in the question. But the split of work between the code in main() and in the join_strings() function is not good. This is a better separation of duties:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

static char *join_strings(char **src, char *separator);

int main(void)
{
    char *strings[] = { "A", "B", NULL };
    char *separator = "XXX";
    char *result = join_strings(strings, separator);

    printf("[[%s]]\n", result);
    free(result);

    return 0;
}

static char *join_strings(char **src, char *separator)
{
    size_t len_sep = strlen(separator);
    size_t num_str = 0;

    for (size_t i = 0; src[i] != NULL; i++)
        num_str++;

    size_t len_str = 0;
    for (int i = 0; src[i] != NULL; i++)
        len_str += strlen(src[i]);

    size_t buf_len = len_str + (len_sep * num_str) + 1;
    printf("Separator: '%s' | Len Array: %zu | Len Strings: %zu | Malloc Buffer Size: %zu\n",
           separator, num_str, len_str, buf_len);
    char *result = malloc(buf_len);
    if (result == 0)
    {
        fprintf(stderr, "failed to allocate %zu bytes of memory\n", buf_len);
        exit(EXIT_FAILURE);
    }

    char *dst = result;
    for (size_t i = 0; src[i] != NULL; i++)
    {
        char *str = src[i];
        for (size_t j = 0; str[j] != '\0'; j++)
            *dst++ = str[j];
        for (size_t i = 0; i < len_sep; i++)
            *dst++ = separator[i];
    }

    *dst = '\0';
    return result;
}

The output from this is the same as before — with the same wart as before w.r.t 'separator' vs 'terminator'.

Upvotes: 1

Gyubong Lee
Gyubong Lee

Reputation: 73

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void join_strings(char* joined_string_buffer, const char* src[], const char* separator);

int main(void) {

    const char* strings[] = { "A", "B", NULL };
    char** copied_strings = (char**) malloc(sizeof strings);

    // Join strings with a separator
    const char* separator = "XXX";
    size_t num_array_elements = (sizeof strings / sizeof * strings) - 1; // because last element is NULL
    size_t len_separator = strlen(separator);
    size_t len_strings = 0;
    for (int i = 0; strings[i] != NULL;i++) len_strings += strlen(strings[i]);
    size_t malloc_buffer_size = len_strings + (len_separator * (num_array_elements - 1)) + 1;
    printf("Separator: %s | Len Array: %lu | Len Strings: %lu | Malloc Buffer Size: %lu\n", separator, num_array_elements, len_strings, malloc_buffer_size);
    char* joined_string_buffer = (char*) malloc(malloc_buffer_size);

    join_strings(joined_string_buffer, strings, separator);

    // Result is AXXXBXXX
    printf("%s\n", joined_string_buffer);

}

void join_strings(char* joined_string_buffer, const char* src[], const char* separator) {

    size_t sep_len = strlen(separator);

    while (*src) {
        size_t string_len = strlen(*src);
        for (int i = 0; i < string_len; i++)
            *joined_string_buffer++ = (*src)[i];
        for (int i = 0; i < sep_len; i++)
            *joined_string_buffer++ = separator[i];
        *src++;
    }

    *joined_string_buffer = '\0';

}

I suppose you made a mistake to select second argument of 'join_strings'

Upvotes: 1

Related Questions