Reputation: 13257
Hi I want to tokenize a string in C
Following is the string.
{Job Started}{Job Running}{Job Running}{Job Finished}
I want to tokenize on { and } , so that I get "Job Started", "Job Running" and "Job Finished"
I also want same delimiter to be used as escaped characters
{Job Started}{Job \{ID1\} Running}{Job \{ID2\} Running}{Job Finished}
Should return me following
Job Started, Job {ID1} Running, Job {ID2} Running, Job Finishied.
I have solution with pointer airthmatic, but want to avoid re-iterating on the input string more than once.
Any suggestion.
Upvotes: 2
Views: 430
Reputation: 16512
If that one is your only scanning/tokenizing problem you will probably better go with the solution you already have or implement FSM as Ferruccio sugggested.
If you have other similar problems, on the other hand, you might look for a tool or library that could help you with that. Someone suggested lex but you could set up for a regular expression library too.
Given a string matching library you could write something like:
pmx_t ret;
ret = pmxMatchStr(src,"&e\\&K{(<*!}>)}&K{(<*!}>)}&K{(<*!}>)}&L")));
if (ret) {
printf("%.*s, %.*s, %.*s\n",pmxLen(ret,1),pmxStart(ret,1),
pmxLen(ret,2),pmxStart(ret,2),
pmxLen(ret,3),pmxStart(ret,3)
}
(also handle spaces before or between the {...}
and eats up the end of the line)
Yes, the example is a shameless promotion of my library (pmx) but the same concept is applicable using one of the many others that you could find googling for regexp or regular expression libraries in C.
Upvotes: 0
Reputation: 10393
You can use sscanf. You may want to create appropriate delimiters using the reference in the link.
/* sscanf example */
#include <stdio.h>
int main ()
{
char sentence []="Rudolph is 12 years old";
char str [20];
int i;
sscanf (sentence,"%s %*s %d",str,&i);
printf ("%s -> %d\n",str,i);
return 0;
}
Output:
Rudolph -> 12
Strtok and strtok_r(reentrant version of strtok) can be used to parse the string too.
PS: I am copying my example here from another question with similar requirements
Upvotes: 0
Reputation: 100658
You can use a simple finite state machine:
#include <stdio.h>
int main() {
char *src = "{Job Started}{Job \\{ID1\\} Running}{Job \\{ID2\\} Running}{Job Finished}";
char token[100] = {}, *dst = token, ch;
int state = 0;
while ((ch = *src++) != 0) {
switch (state) {
case 0:
if (ch == '{') state = 1;
break;
case 1:
switch (ch) {
case '}':
printf("token: %s\n", token);
dst = token;
*dst = 0;
state = 0;
break;
case '\\':
state = 2;
break;
default:
*dst++ = ch;
*dst = 0;
}
break;
case 2:
*dst++ = ch;
*dst = 0;
state = 1;
break;
}
}
}
Upvotes: 5
Reputation: 59
char *tokenizer(char *ptr) {
char *str = ptr;
char *aux = ptr;
while (*ptr) {
if ( *ptr == '\\' && ( *(ptr + 1) == '{' || *(ptr + 1) == '}') ) {
*aux++ = *(ptr + 1);
ptr += 2;
}
else if ( *ptr == '{') {
++ptr;
}
else if ( *ptr == '}' ) {
*aux++ = ( *(++ptr) != '\0' ) ? ',' : '.';
}
else {
*aux++ = *ptr++;
}
}
*aux = '\0';
return str;
}
Upvotes: 0
Reputation: 16379
char **
split( char **result, char *tmp, const char *src, const char *delim, size_t len)
{
int i=0;
char *p=NULL;
for(i=0; i<len; i++)
result[i]=NULL;
if(!*src)
return result;
strcpy(tmp, src);
for(i=0, p=strtok(tmp, delim); p!=NULL; p=strtok(NULL, delim), i++ )
{
result[i]=p;
}
return result;
}
This example does not destroy the original string, you pass in a working string.
Upvotes: 0
Reputation: 36896
writing your own function to tokenize this should be pretty simple, especially if you know where the string is coming from (and don't need to worry about strange user input, for example {a}{
, {{{{{
, }a{
, {blah} {blah}
).
something like [written quickly and untested!!]:
int tokenize(char* inp, char** outp)
{
char i = inp;
int currentToken = 0;
if(*i == 0)
return 0;
outp = (char**)malloc(sizeof(char*) * (strlen(inp) / 2));// allocate a buffer that can hold the maximum # of tokens.
outp[0] = i;
while(*i != 0)
{
switch(*i)
{
case '{':
// start a new token
tokenCount = tokenCount + 1;
outp[currentToken] = i;
break;
case '}':
// skip this character. we assume there is a { coming next.
break;
case '\\':
i = i + 1;
if(*1 == 0)
break;
// intentional fall-through
default:
*outp[currentToken] = *i;
break;
}
if(*i == 0)
break;
i = i + 1;
}
return currentToken + 1;
}
Upvotes: 1
Reputation: 12235
If you want to extend its functionality you could take a look at the Eric Robert's scannerADT
. It's very straightforward to use and you could add a setDelimiter
method to it.
Here are the .c and .h for it.
Upvotes: 0
Reputation: 2377
I've used strtok() for this. This doesn't work for the strings with escaped characters, but I think it can be modified to understand them. It isn't that trivial though. Hopefully this will give you some help.
#include <stdio.h>
#include <string.h>
int main(void) {
char str[] = "{Job Started}{Job Running}{Job Running}{Job Finished}";
char* pch;
pch = strtok(str,"{}");
while(pch!=NULL) {
printf("%s\n",pch);
pch = strtok(NULL,"{}");
}
return 0;
}
Delnan has a point there. String manipulation is way too difficult and vulnerable to failures in pointer handling in C. If C isn't mandatory for your project, you should definitely use some other language.
Upvotes: 0
Reputation: 40830
You can use strtok
()
with a delimiter set of {}
(and whatever else you need). A sequence of two or more contiguous delimiter characters in the parsed string is considered to be a single delimiter, plus you can modify the delimiter set between successive calls. Also note that strtok() modifies the string given to it.
edit: I realised this is not quite enough for your 2nd requirement.
Upvotes: 1