Read more than one word from string with sscanf

Question

I am trying to read formatted content from a file. To do so, I read line by line using fgets() and sscanf().

The content of the file is supposed to be a table. One row would look like the following example:

456    2    39    chained_words    62.5    // comment with more than one word

To read it, I use:

fgets(temp,MAXLINELENGTH,file);
sscanf(temp,"%d %d %d %s %f // %s",&num1,&num2,&num3,word,&num4,comment);

It works fine with the first five elements plus the first word after the //, but the problem is that I need to store the whole comment in the comment char * variable. I have tried multiple solutions proposed in other posts, like specifying format that excludes certain characters, but nothing worked.

I'd appreciate any hint to solve the problem!

David C. Rankin · Accepted Answer

Following on from your comment, if you were to add another number after the existing comment, that would complicate things a bit. The reason being is that with comment containing multiple words, you have no discrete end to search for.

However, C rarely lets you down. Whenever you need to parse data from a line or buffer, you look at the format of your data and ask "What am I going to use as my reference for the beginning or end of what I need?" Here, with nothing in comment, we will need to use the end of the buffer as a reference and work backwards.

Doing do we are going to assume that the value is the last thing on the line before the newline (no tabs or spaces follow). We could loop backwards until we find the last non-whitespace character to validate, but for purposes here we make our assumption.

For purposes of this problem, we will break parsing line into 2 parts. We can read everything up to the comment with our original sscanf call in a reliable fashion. So we will consider everything in the first part of a line (up to and including the float) part 1, and everything after the comment characters // part 2. You read/parse part one as usual:

        sscanf (line, "%d %d %d %s %f", &d1, &d2, &d3, word, &f1);

Searching for a specific character in a line, we have a manual char-by-char comparison (we always have that) and we have strchr and strrchr functions in string.h that will search a line of text for the first (strchr) or last (strrchr) occurrence of the given character. Both functions return a pointer to that character within the string.

Working backwards from the end of our line, if we find /, we now have a pointer (the address within the string) to the last '/' before the beginning of the comment. We now read the entire remainder of the line into comment (value and all) using our pointer.

        p = strrchr (line, '/');            /* find last '/' in line    */
        sscanf (p, "/ %[^
]%*c", comment); /* read comment and value   */

Now we are working in only comment (instead of line). We know if we work backwards from the end of comment looking for a space ' ', we will be in a position to read our last value. After we read the last value, since we have our pointer pointing to the address right before the value, we know we can null-terminate comment at the pointer to finish our parse.

        p = strrchr (comment, ' ');         /* find last space in file  */
        sscanf (p, " %d", &d4);             /* read last value into d4  */
        *p = 0;                             /* null-terminate comment   */

(note: you can check/removed any trailing spaces in comment if needed, but for our purposes, that is omitted)

Putting it all together you would have something that looked like this:

Quick Example

#include 
#include 
#include 

#define MAXS 128

int main (int argc, char **argv) {

    if (argc < 2 ) {                /* check for at least 1 argument    */
        fprintf (stderr, "error: insufficient input, usage: %s filename
", 
                argv[0]);
        return 1;
    }

    char line[MAXS] = {0};
    char word[MAXS] = {0};
    char comment[MAXS] = {0};
    char *p = NULL;
    size_t idx = 0;
    int d1, d2, d3, d4;
    float f1 = 0.0;
    FILE *fp = NULL;

    d1 = d2 = d3 = d4 = 0;

    if (!(fp = fopen (argv[1], "r"))) {  /* open/validate file   */
        fprintf (stderr, "error: file open failed '%s'.", argv[1]);
        return 1;
    }

    while (fgets (line, MAXS, fp) != NULL)  /* read each line in file */
    {
        /* read buffer through first float */
        sscanf (line, "%d %d %d %s %f", &d1, &d2, &d3, word, &f1);

        p = strrchr (line, '/');            /* find last '/' in line    */
        sscanf (p, "/ %[^
]%*c", comment); /* read comment and value   */
        p = strrchr (comment, ' ');         /* find last space in file  */
        sscanf (p, " %d", &d4);             /* read last value into d4  */
        *p = 0;                             /* null-terminate comment   */

        printf ("
line : %zu

 %s
", idx, line);
        printf ("   d1 : %d
   d2 : %d
   d3 : %d
   d4 : %d
   f1 : %.2f
",
                d1, d2, d3, d4, f1);
        printf ("   chained : %s
   comment : %s
", word, comment);

        idx++;
    }

    fclose (fp);

    return 0;
}

Input

$ cat dat/strwcmt.txt
456    2    39    chained_words    62.5    // comment with more than one word    227
457    2    42    more_chained_w   64.5    // another comment    228
458 3 45 s_n_a_f_u 66.5 // this is still another comment 229

Output

$ ./bin/str_rd_mixed dat/strwcmt.txt

$ ./bin/str_rd_mixed dat/strwcmt.txt

line : 0

 456    2    39    chained_words    62.5    // comment with more than one word    227

   d1 : 456
   d2 : 2
   d3 : 39
   d4 : 227
   f1 : 62.50
   chained : chained_words
   comment : comment with more than one word

line : 1

 457    2    42    more_chained_w   64.5    // another comment    228

   d1 : 457
   d2 : 2
   d3 : 42
   d4 : 228
   f1 : 64.50
   chained : more_chained_w
   comment : another comment

line : 2

 458 3 45 s_n_a_f_u 66.5 // this is still another comment 229

   d1 : 458
   d2 : 3
   d3 : 45
   d4 : 229
   f1 : 66.50
   chained : s_n_a_f_u
   comment : this is still another comment

Note: There is no limit to the different ways to approach this. This is simply one approach. Another would be to tokenize the entire line into separate words, check whether each word begins with a digit (and contains a '.' for a float) and then simply convert all numbers and concatenate all non-number words as needed. It's up to you. The bigger your toolbox, the more ways you will see to approach it.

Read more than one word from string with sscanf

Answers (1)

Related Questions