Reputation: 675
I am trying to parse a input JSON file in C and the contents of the file are something like: {"version":"0.1","type":"tracbuf:e", "userid":0, "method":"udpbuf:-paris", "src":"10.20.6.191"buf:, "dst":"8.8.8.8", "sport":41buf:687, "dport":33435, ...
The input file is quite big and I want to read every line and parse the read line i.e., separate the key-value pairs respectively and the lines and key-value fields are arbitrary in length.
I know it works fine with fscanf()
. But I would like to use some function with bounds protection such as fgets()
. But I'm not quite sure how to use fgets() in this case because it reads the 'n' bytes into the buffer including new line, tab space and blankspace('')
. But I want to be able to break once a blank space / tab space/ newline character is encountered for parsing the read characters before reading the other lines further.
Please note that fscanf()
can do this along with strtok()
- it breaks with every space/new line. But sadly, it doesn't allow bounds on the buffer.
How do I go about?
Update Edit:
The below approach works. Inside every conditional, I need to invoke strtok and around 6 comparisons are made. But I would like to know if this can be done even better.
while(fscanf(fp, "%100s", buf) != EOF)
{
token=strtok(buf,":-");
while(token!=NULL)
{
if(strcmp(token,"\"src\"")==0)
{
head[trace_count]=(HEADER*) malloc(sizeof(HEADER));
token=strtok(NULL,":{,}])");
strcpy(head[trace_count]->src_ip,token);
}
else if(strcmmp(...))
{
}
...
}
Upvotes: 1
Views: 3836
Reputation: 25266
As lines can be any length, you will always have a problem reading a line into a buffer; the buffer could potentially never be large enough. The only way is by character processing. The following is a simple, basic parser for your syntax. Adapt as you need:
void example(FILE *fin)
{
char c, token[MAX_TOKEN], tokval[MAX_SVAL], *s= token;
int instr= FALSE; // track whether we are in a string
int intok= TRUE; // track whether we are in a token name or a value
while ((c=fgetc(fin)) != EOF)
{
if (instr) {
if (c=='"')
{*s='\0'; instr= FALSE;}
else *s++ = c;
}
else switch (c) {
case '"': instr= TRUE; break;
case '{': /* open: whatever you want to do*/ break;
case '}': /* close: whatever you want to do*/ break;
case ':': if (intok) {*s= '\0'; s= tokval; intok= FALSE; /* have token name now*/} else *s++ = ':'; break;
case ',': if (!intok) {*s= '\0'; s= token; intok= TRUE; /* have a pair now */} else *s++ = ','; break;
case ' ': case '\t': case '\n': case '\r': break;
default: *s++ = c;
}
if (intok)
{if (s > token+MAX_TOKEN-2) error("token name too long");}
else {if (s > tokval+MAX_SVAL-2) error("token value too long");}
}
}
Upvotes: 2