Kijan
Kijan

Reputation: 286

C string split problem

I am writing small IRC Bot, and i need to split incoming messages for easier handling. I wrote a function get_word, which should split string. According to gdb and valgrind, problem is that function sometimes returns invalid pointer, and program fails when trying to free that pointer. Here is the code:

char **get_word(char *str) {
   char **res;
   char *token, *copy;
   int size = 1;
   for(int i = 0; str[i] != '\0'; i++) {
      if(str[i] == ' ') {
         while(str[i] == ' ') {
            i++;
         }
         size++;
      }
   }
   res = malloc((size + 1) * sizeof(char *));
   copy = strdup(str);
   token = strtok(copy, " ");
   for(int i = 0; token != NULL; i++) {
      res[i] = strdup(token);
      token = strtok(NULL, " ");
   }
   free(copy);
   res[size] = NULL;
   return res;
}

Upvotes: 0

Views: 503

Answers (4)

Dmitri
Dmitri

Reputation: 9385

As julkiewicz points out, your nested loops where you count the words could miss the terminating null on str. Additionally, if str ends with spaces, your current code would count an extra word.

You could replace this section:

int size = 1;
for(int i = 0; str[i] != '\0'; i++) {
   if(str[i] == ' ') {
      while(str[i] == ' ') {
         i++;
      }
      size++;
   }
}

..with something like this:

while (*str == ' ') str++;  // skip leading spaces on str
/* count words */
int size = 0;
char *s = str;
do {
  if (*s && *s != ' ') {
    size++;                       // non-space group found
    while (*s && *s != ' ') s++;  // skip to next space
  }
  while (*s == ' ') s++;          // skip spaces after words
} while (*s);

..which counts the starts of groups of non-space characters, rather than groups of spaces, and watches for the terminating null in the inner loops.

You might also consider changing:

for(int i = 0; token != NULL; i++) {

..to:

for(int i = 0; token && i < size; i++) {

..just as a slightly paranoid guard in case strtok finds more words than you counted (though it shouldn't).

Upvotes: 0

wildplasser
wildplasser

Reputation: 44250

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char **split (const char *str)
{
char **arr = NULL;
size_t cnt=0;
size_t pos, len;

arr = malloc (sizeof *arr);
for (pos = 0; str[pos]; pos += len) {
    char *dup;
    len = strspn(str+pos, " \t\r\n" );
    if (len) continue;

    len = strcspn(str+pos, " \t\r\n" );
    if (!len) break;

    arr = realloc (arr, (2+cnt) * sizeof *arr);
    dup = malloc (1+len);
    memcpy (dup, str+pos, len);
    dup [len] = 0;
    arr [cnt++] = dup;
    }
arr[cnt] = NULL;
return arr;
}

int main(int argc, char **argv)
{
char **zzz;

for( zzz = split( argv[1] ); *zzz; zzz++) {
    printf( "->%s\n", *zzz );
    }
return 0;
}

The reallocation is a bit clumsy (like in the OP) and improving it is left as an exercise to the reader 8-}

Upvotes: 0

julx
julx

Reputation: 9091

One problem I see is with your nested loops:

Consider this input: ' \0'

The function execution reaches the for loop, i == 0. Then the while loop is also entered. At the end of while loop i == 1. Now the incrementation statement from the for loop is executed and i == 2. So next you will be reading past the end of the string.

EDIT

I understand that size is the number of words found in the input. So I'd go for something like:

for (int i = 0; str[i] != '\0'; ++i) {
    if (str[i] != ' ' && (str[i + 1] == ' ' || str[i + 1] == '\0')) {
         // Counting endings of words
         size++;
    }
}

Upvotes: 3

jwodder
jwodder

Reputation: 57630

I think gdb may be complaining about the fact that you never check the return value of malloc (or strdup) to see if it's non-NULL.

Upvotes: -2

Related Questions