Igor
Igor

Reputation: 558

Parsing CSV Values in C

I am trying to write a basic CSV parser in C that generates a dynamic array of char* when given a char* and a separator character, such as a comma:

char **filldoc_parse_csv(char *toparse, char sepchar)
{
    char **strings = NULL;

    char *buffer = NULL;

    int j = 0;
    int k = 1;

    for(int i=0; i < strlen(toparse); i++)
    {
        if(toparse[i] != sepchar)
        {
            buffer = realloc(buffer, sizeof(char)*k);
            strcat(buffer, (const char*)toparse[i]);
            k++;
        }
        else
        {
            strings = realloc(strings, sizeof(buffer)+1);

            strings[j] = buffer;

            free(buffer);

            j++;

        }
    }

    return strings;


}

However, when I call the function as in the manner below:

char **strings = filldoc_parse_csv("hello,how,are,you", ',');

I end up with a segmentation fault:

Program received signal SIGSEGV, Segmentation fault.
__strcat_sse2 () at ../sysdeps/x86_64/multiarch/../strcat.S:166
166 ../sysdeps/x86_64/multiarch/../strcat.S: No such file or directory.
(gdb) backtrace
#0  __strcat_sse2 () at ../sysdeps/x86_64/multiarch/../strcat.S:166
#1  0x000000000040072c in filldoc_parse_csv (toparse=0x400824 "hello,how,are,you", sepchar=44 ',') at filldocparse.c:20
#2  0x0000000000400674 in main () at parsetest.c:6

The problem is centered around allocating enough space for the buffer string. If I have to, I will make buffer a static array, however, I would like to use dynamic memory allocation for this purpose. How can I do it correctly?

Upvotes: 0

Views: 253

Answers (2)

chux
chux

Reputation: 153508

Various problems

  1. strcat(buffer, (const char*)toparse[i]); attempts to changes a char to a string.

  2. strings = realloc(strings, sizeof(buffer)+1); reallocates the same amount of space. sizeof(buffer) is the size of the pointer buffer, not the size of memory it points to.

  3. The calling function has no way to know how many entries in strings. Suggest andding a NULL sentinel.

Minor: better to use size_t rather than int. Use more descriptive names. Do not re-call strlen(toparse) repetitively. Use for(int i=0; toparse[i]; i++) . Make toparse a const char *

Upvotes: 2

Some programmer dude
Some programmer dude

Reputation: 409186

You have problems with your memory allocations. When you do e.g. sizeof(buffer) you will get the size of the pointer and not what it points to. That means you will in the first run allocate five bytes (on a 32-bit system), and the next time the function is called you will allocate five bytes again.

There are also many other problems, like you freeing the buffer pointer once you assigned the pointer to strings[j]. The problem with this is that the assignment only copies the pointer and not what it points to, so by freeing buffer you also free strings[j].

Both the above problems will lead to your program having undefined behavior, which is the most common cause of runtime-crashes.

You should also avoid assigning the result of realloc to the pointer you're trying to reallocate, because if realloc fails it will return NULL and you loose the original pointer causing a memory leak.

Upvotes: 2

Related Questions