Abhishek Balaji R
Abhishek Balaji R

Reputation: 675

fgets to stop when a space is encountered

I am trying to parse a input JSON file in C and the contents of the file are something like: {"version":"0.1","type":"tracbuf:e", "userid":0, "method":"udpbuf:-paris", "src":"10.20.6.191"buf:, "dst":"8.8.8.8", "sport":41buf:687, "dport":33435, ...

The input file is quite big and I want to read every line and parse the read line i.e., separate the key-value pairs respectively and the lines and key-value fields are arbitrary in length.

I know it works fine with fscanf(). But I would like to use some function with bounds protection such as fgets(). But I'm not quite sure how to use fgets() in this case because it reads the 'n' bytes into the buffer including new line, tab space and blankspace(''). But I want to be able to break once a blank space / tab space/ newline character is encountered for parsing the read characters before reading the other lines further.

Please note that fscanf() can do this along with strtok() - it breaks with every space/new line. But sadly, it doesn't allow bounds on the buffer.

How do I go about?

Update Edit:

The below approach works. Inside every conditional, I need to invoke strtok and around 6 comparisons are made. But I would like to know if this can be done even better.

while(fscanf(fp, "%100s", buf) != EOF)

{
token=strtok(buf,":-");

while(token!=NULL)

{

if(strcmp(token,"\"src\"")==0) 

{ 

head[trace_count]=(HEADER*) malloc(sizeof(HEADER));

token=strtok(NULL,":{,}])");

strcpy(head[trace_count]->src_ip,token);

}   
else if(strcmmp(...))
{
}

...
}     

Upvotes: 1

Views: 3836

Answers (1)

Paul Ogilvie
Paul Ogilvie

Reputation: 25266

As lines can be any length, you will always have a problem reading a line into a buffer; the buffer could potentially never be large enough. The only way is by character processing. The following is a simple, basic parser for your syntax. Adapt as you need:

void example(FILE *fin)
{
    char c, token[MAX_TOKEN], tokval[MAX_SVAL], *s= token;
    int instr= FALSE;   // track whether we are in a string
    int intok= TRUE;    // track whether we are in a token name or a value

    while ((c=fgetc(fin)) != EOF)
    {
        if (instr) {
            if (c=='"')
                 {*s='\0'; instr= FALSE;}
            else *s++ = c;
        }
        else switch (c) {
        case '"': instr= TRUE; break;
        case '{': /* open:  whatever you want to do*/ break;
        case '}': /* close: whatever you want to do*/ break;
        case ':': if (intok)  {*s= '\0'; s= tokval; intok= FALSE; /* have token name now*/} else *s++ = ':'; break;
        case ',': if (!intok) {*s= '\0'; s= token;  intok= TRUE;  /* have a pair now    */} else *s++ = ','; break;
        case ' ': case '\t': case '\n': case '\r': break;
        default: *s++ = c;
        }
        if (intok)
             {if (s > token+MAX_TOKEN-2) error("token name too long");}
        else {if (s > tokval+MAX_SVAL-2) error("token value too long");}
    }   
}

Upvotes: 2

Related Questions