trrrrrrm
trrrrrrm

Reputation: 11812

How to split a string into tokens in C?

How to split a string into tokens by '&' in C?

Upvotes: 8

Views: 23085

Answers (5)

Fe2O3
Fe2O3

Reputation: 8354

It's quite simple:

char str[] = "&a&&b&c"; // a mutable string

for (char *cp = str; (cp = strtok(cp, "&")) != NULL; cp = NULL) {
    /* do something with the token */
}

There's a single call to strtok(), a single instance of the delimiter string, and the scope of cp is contained within this loop.

The suggestion to use strchr() doesn't account for multiple delimiter characters. It might have been better had it suggested strcspn(). And, notice that the last token has to be handled AFTER the loop has finished. Not a great design...

An advantage of strtok() is that the tokens have been isolated in their current location. If their addresses are preserved in an array of pointers, they can be used repeatedly without isolating them over and over again.

Upvotes: 0

For me, using strtok() function is unintuitive and too complicated, so I managed to create my own one. As arguments it accepts a string to split, character which determinates spaces between tokens and pointer representing number of found tokens (useful when printing these tokens in loop). A disadvantage of this function is fixed maximum lenght of each token.

#include <stdlib.h>
#include <string.h>
#define MAX_WORD_LEN 32


char **txtspt(const char *text, char split_char, int *w_count)
{
    if(strlen(text) <= 1) 
        return NULL;

    char **cpy0 = NULL;
    int i, j = 0, k = 0, words = 1;

    //Words counting
    for(i = 0; i < strlen(text); ++i)
    {
        if(text[i] == split_char && text[i + 1] != '\0')
        {
            ++words;
        }
    }
    //Memory reservation
    cpy0 = (char **) malloc(strlen(text) * words);
    for(i = 0; i < words; ++i)
    {
        cpy0[i] = (char *) malloc(MAX_WORD_LEN);
    }

    //Splitting
    for(i = 0; i < strlen(text) + 1; ++i)
    {
       if(text[i] == split_char)
       {
           cpy0[k++][j] = '\0';
           j = 0;
       }
       else
       {
           if(text[i] != '\n')           //Helpful, when using fgets() 
                cpy0[k][j++] = text[i];  //function
       }

    }

    *w_count = words;
    return cpy0;
}

Upvotes: 0

R Samuel Klatchko
R Samuel Klatchko

Reputation: 76541

strtok / strtok_r

char *token;
char *state;

for (token = strtok_r(input, "&", &state);
     token != NULL;
     token = strtok_r(NULL, "&", &state))
{
    ...
}

Upvotes: 13

Cees Meijer
Cees Meijer

Reputation: 782

You can use the strok() function as shown in the example below.

/// Function to parse a string in separate tokens 

int parse_string(char pInputString[MAX_STRING_LENGTH],char *Delimiter,
                   char *pToken[MAX_TOKENS])
{
  int i;
  i = 0;

  pToken[i] = strtok(pInputString, Delimiter);
  i++;

  while ((pToken[i] = strtok(NULL, Delimiter)) != NULL){
     i++;
  }
  return i;
}

/// The array pTokens[] now contains the pointers to the start of each token in the (unchanged) original string.

sprintf(String,"Token1&Token2");
NrOfParameters = parse_string(String,"&",pTokens);

sprintf("%s, %s",pToken[0],pToken[1]);

Upvotes: 3

Alok Singhal
Alok Singhal

Reputation: 96141

I would do it something like this (using strchr()):

#include <string.h>

char *data = "this&&that&other";
char *next;
char *curr = data;
while ((next = strchr(curr, '&')) != NULL) {
    /* process curr to next-1 */
    curr = next + 1;
}
/* process the remaining string (the last token) */

strchr(const char *s, int c) returns a pointer to the next location of c in s, or NULL if c isn't found in s.

You might be able to use strtok(), however, I don't like strtok(), because:

  • it modifies the string being tokenized, so it doesn't work for literal strings, or is not very useful when you want to keep the string for other purposes. In that case, you must copy the string to a temporary first.
  • it merges adjacent delimiters, so if your string was "a&&b&c", the returned tokens are "a", "b", and "c". Note that there is no empty token after "a".
  • it is not thread-safe.

Upvotes: 9

Related Questions