Nikolai Hesterberg
Nikolai Hesterberg

Reputation: 3

Parsing comma separated strings with spaces

I need to parse a string that contains two integers and two strings, all separated by commas. The strings contain spaces which is causing issues. Format is integer, strings[including spaces], string[including spaces], integer. Working on Linux with gcc, c99 standard.

I've tried parsing with various regex style parsing methods. I have solution that works if the strings don't have spaces but breaks when spaces are involved.

char *line = "5,some text, some more text with spaces, 3";
int num1, num2;
char string1[max_size];
char string2[max_size];

sscanf(line, "%d,%[^,],%[^,],%d", &num1, string1, string2, &num2);

I expect the variables to contain:

num1 == 5;
string1 == "some text";
string2 == "some more text with spaces";
num2 == 3;

I am not getting compilation errors or anything, but I am getting issues where if there are any spaces, the data becomes junk.

Upvotes: 0

Views: 706

Answers (1)

Steve Summit
Steve Summit

Reputation: 48020

Here is a straightforward rewrite of your code to use strtok:

char line[] = "5,some text, some more text with spaces, 3";
int num1, num2;
char *string1;
char *string2;

num1 = atoi(strtok(line, ","));
string1 = strtok(NULL, ",");
string2 = strtok(NULL, ",");
num2 = atoi(strtok(NULL, ","));

printf("num1 = %d\n", num1);
printf("str1 = \"%s\"\n", string1);
printf("str2 = \"%s\"\n", string2);
printf("num2 = %d\n", num2);

This works, although it has these limitations:

  • I'm not checking strtok's return value to see if it returns NULL prematurely (indicating fewer than 4 fields in the input)
  • atoi has no error handling, either, and will quietly return 0 if the numeric fields aren't numeric
  • overall, strtok is a pretty poor function, too (its statefulness is quite lame)
  • strtok basically skips over empty fields, which probably isn't what you want here (if the input line were for example something like "12,string,,34")

Nevertheless, this is probably a step better than trying to use sscanf.

Note also that I changed line to an array, so that it's modifiable, since strtok inserts \0 characters into it to terminate the strings it tokenizes. (That's why string1 and string2 can be pointers now.)

Upvotes: 1

Related Questions