Avinash
Avinash

Reputation: 13257

tokenizing a string in C

Hi I want to tokenize a string in C

Following is the string.

{Job Started}{Job Running}{Job Running}{Job Finished}

I want to tokenize on { and } , so that I get "Job Started", "Job Running" and "Job Finished"

I also want same delimiter to be used as escaped characters

{Job Started}{Job \{ID1\} Running}{Job \{ID2\} Running}{Job Finished}

Should return me following

Job Started, Job {ID1} Running, Job {ID2} Running, Job Finishied.

I have solution with pointer airthmatic, but want to avoid re-iterating on the input string more than once.

Any suggestion.

Upvotes: 2

Views: 430

Answers (9)

Remo.D
Remo.D

Reputation: 16512

If that one is your only scanning/tokenizing problem you will probably better go with the solution you already have or implement FSM as Ferruccio sugggested.

If you have other similar problems, on the other hand, you might look for a tool or library that could help you with that. Someone suggested lex but you could set up for a regular expression library too.

Given a string matching library you could write something like:

pmx_t ret;

ret = pmxMatchStr(src,"&e\\&K{(<*!}>)}&K{(<*!}>)}&K{(<*!}>)}&L")));
if (ret) {
  printf("%.*s, %.*s, %.*s\n",pmxLen(ret,1),pmxStart(ret,1),
                              pmxLen(ret,2),pmxStart(ret,2),
                              pmxLen(ret,3),pmxStart(ret,3)
}

(also handle spaces before or between the {...} and eats up the end of the line)

Yes, the example is a shameless promotion of my library (pmx) but the same concept is applicable using one of the many others that you could find googling for regexp or regular expression libraries in C.

Upvotes: 0

Praveen S
Praveen S

Reputation: 10393

You can use sscanf. You may want to create appropriate delimiters using the reference in the link.

/* sscanf example */

#include <stdio.h>

int main ()
{
  char sentence []="Rudolph is 12 years old";
  char str [20];
  int i;

  sscanf (sentence,"%s %*s %d",str,&i);
  printf ("%s -> %d\n",str,i);

  return 0;
}

Output:

Rudolph -> 12

Strtok and strtok_r(reentrant version of strtok) can be used to parse the string too.

PS: I am copying my example here from another question with similar requirements

Upvotes: 0

Ferruccio
Ferruccio

Reputation: 100658

You can use a simple finite state machine:

#include <stdio.h>

int main() {
    char *src = "{Job Started}{Job \\{ID1\\} Running}{Job \\{ID2\\} Running}{Job Finished}";

    char token[100] = {}, *dst = token, ch;

    int state = 0;
    while ((ch = *src++) != 0) {
        switch (state) {
            case 0:
                if (ch == '{') state = 1;
                break;
            case 1:
                switch (ch) {
                    case '}':
                        printf("token: %s\n", token);
                        dst = token;
                        *dst = 0;
                        state = 0;
                        break;
                    case '\\':
                        state = 2;
                        break;
                    default:
                        *dst++ = ch;
                        *dst = 0;
                }
                break;
            case 2:
                *dst++ = ch;
                *dst = 0;
                state = 1;
                break;
        }
    }
}

Upvotes: 5

Wladimir
Wladimir

Reputation: 59

char *tokenizer(char *ptr) {
    char *str = ptr;
    char *aux = ptr;

    while (*ptr) {
        if ( *ptr == '\\' && ( *(ptr + 1) == '{' || *(ptr + 1) == '}') ) {
            *aux++ = *(ptr + 1);
            ptr += 2;
        }
        else if ( *ptr == '{') {
            ++ptr;
        }
        else if ( *ptr == '}' ) {
            *aux++ = ( *(++ptr)  != '\0' ) ? ',' : '.';
        }
        else {
            *aux++ = *ptr++;
        }
    }
    *aux = '\0';
    return str;
}

Upvotes: 0

jim mcnamara
jim mcnamara

Reputation: 16379

char **
split( char **result, char *tmp, const char *src, const char *delim, size_t len)
{
   int i=0;
   char *p=NULL;
   for(i=0; i<len; i++) 
      result[i]=NULL;
   if(!*src)
      return result;
   strcpy(tmp, src);
   for(i=0, p=strtok(tmp, delim); p!=NULL; p=strtok(NULL, delim), i++ )
   {
      result[i]=p;
   }
   return result;
}

This example does not destroy the original string, you pass in a working string.

Upvotes: 0

tenfour
tenfour

Reputation: 36896

writing your own function to tokenize this should be pretty simple, especially if you know where the string is coming from (and don't need to worry about strange user input, for example {a}{, {{{{{, }a{, {blah} {blah}).

something like [written quickly and untested!!]:

int tokenize(char* inp, char** outp)
{
    char i = inp;
    int currentToken = 0;

    if(*i == 0)
        return 0;

    outp = (char**)malloc(sizeof(char*) * (strlen(inp) / 2));// allocate a buffer that can hold the maximum # of tokens.
    outp[0] = i;

    while(*i != 0)
    {
        switch(*i)
        {
            case '{':
                // start a new token
                tokenCount = tokenCount + 1;
                outp[currentToken] = i;
                break;
            case '}':
                // skip this character. we assume there is a { coming next.
                break;
            case '\\':
                i = i + 1;
                if(*1 == 0)
                    break;
                // intentional fall-through
            default:
                *outp[currentToken] = *i;
                break;
        }
        if(*i == 0)
            break;
        i = i + 1;
    }

    return currentToken + 1;
}

Upvotes: 1

Lombo
Lombo

Reputation: 12235

If you want to extend its functionality you could take a look at the Eric Robert's scannerADT. It's very straightforward to use and you could add a setDelimiter method to it.

Here are the .c and .h for it.

Upvotes: 0

thevilledev
thevilledev

Reputation: 2377

I've used strtok() for this. This doesn't work for the strings with escaped characters, but I think it can be modified to understand them. It isn't that trivial though. Hopefully this will give you some help.

#include <stdio.h>
#include <string.h>
int main(void) {
    char str[] = "{Job Started}{Job Running}{Job Running}{Job Finished}";
    char* pch;
    pch = strtok(str,"{}");
    while(pch!=NULL) {
        printf("%s\n",pch);
        pch = strtok(NULL,"{}");
    }
    return 0;
}

Delnan has a point there. String manipulation is way too difficult and vulnerable to failures in pointer handling in C. If C isn't mandatory for your project, you should definitely use some other language.

Upvotes: 0

Michael F
Michael F

Reputation: 40830

You can use strtok() with a delimiter set of {} (and whatever else you need). A sequence of two or more contiguous delimiter characters in the parsed string is considered to be a single delimiter, plus you can modify the delimiter set between successive calls. Also note that strtok() modifies the string given to it.

edit: I realised this is not quite enough for your 2nd requirement.

Upvotes: 1

Related Questions