Denys Korovets
Denys Korovets

Reputation: 75

How to parse an input line with sscanf?

I have an input .txt file that looks like this:

Robert Hill 53000 5

Amanda Trapp 89000 3

Jonathan Nguyen 93000 3

Mary Lou Gilley 17000 1 // Note that came contains of 3 parts!

Warren Rexroad 72000 7

I need to read those lines and parse them into three different categories: name (which is an array of chars), mileage (int) and years(int).

 sscanf(line, "%[^] %d %d ", name, &mileage, &years);

This doesn't work very well for me, any suggestions?

Upvotes: 2

Views: 2094

Answers (5)

zwol
zwol

Reputation: 140639

You have discovered one of the three reasons *scanf should never be used: it's almost impossible to write a format specification that handles nontrivial input syntax, especially if you have to worry about recovering from malformed input. But there are two even more important reasons:

  • Many input specifications, including your %[...] construct, are just as happy to overflow buffers as the infamous gets.
  • Numeric overflow provokes undefined behavior -- the C library is licensed to crash just because someone typed too many digits.

The correct way to parse lines like these is to scan for the first digit with strcspn("0123456789", line), or while (*p && !isdigit(*p)) p++;, then use strtoul to convert the numbers that follow.

Upvotes: 0

Filip Roséen
Filip Roséen

Reputation: 63807

THE PROBLEM

The problem with the current specifier passed to sscanf is that it is both ill-formed, and even when fixed it won't do what you want. If you would have used [^ ] as the first conversion specifier, sscanf would try to read as many characters as it can before hitting a space.

If we assume that a name can't contain digits specifying [^0123456789] will read the correct data, but it will also include the trailing space after the name, but before the first mileage entry. This is however easily solved by replacing the last space with a null-byte in name.

To get the number of characters read into name we can use the %n specifier to denote that we'd sscanf to store the number of bytes read into our matching argument; we can later use this value to correctly "trim" our buffer.

We should also specify a maximum width of the characters read by %[^0123456789] so that it doesn't cause a buffer-overflow, this is done by specifying the size of our buffer directly after our %.


SAMPLE IMPLEMENTATION

#include <stdio.h>
#include <string.h>

int
main (int argc, char *argv[])
{
  char const * line = "Mary Lou Gilley 17000 1";

  char     name[255];
  int mileage, years, name_length;

  sscanf(line, "%254[^0123456789]%n %d %d ", name, &name_length, &mileage, &years);

  name[name_length-1] = '\0';

  printf ("data: '%s', %d, %d", name, mileage, years);

  return 0;
}

data: 'Mary Lou Gilley', 17000, 1

Upvotes: 3

ssm
ssm

Reputation: 5373

If you have a function that finds the positon of the first digit like so:

// This function returns the position of the 
// space before the first digit (assuming that
// the names dont contain digits)...
char *digitPos(char *s){
    if isdigit(*(s+1)) return s;
    else return digitPos(s+1);
}

You can then just separate the two variables by inserting a '\0' at the right position like so:

pos  = digitPos(line); // This is a pointer to the space
*pos = '\0';
strcpy(name, line);
sscanf(pos + 1, "%d %d", &mileage, &years);

Upvotes: 1

erik258
erik258

Reputation: 16304

This might help you get started. It lacks the intelligence of BLUEPIXY's solution which handles the trailing whitespace a little better than mine ( or you could chop it off yourself).

dan@rachel ~ $ echogcc -o t t.c
dan@rachel ~ $ echo "Dan P F 3 21" | ./t
Name:    Dan P F ,
Mileage:         3,
Years:   21.

Here's the code.

#include <stdio.h>
#include <string.h>

int main(){
        char *buf;
        int mileage, years;
        while(!feof(stdin) ){
                if( fscanf( stdin, "%m[^0-9] %d %d", &buf, &mileage, &years) == 3 ){
                        fprintf(stderr, "Name:\t %s,\nMileage:\t %d,\nYears:\t %d.\n", 
                                buf, mileage, years
                        );
                }
        }

}

Upvotes: 0

BLUEPIXY
BLUEPIXY

Reputation: 40145

int pos;
sscanf(line, "%*[^0-9]%n", &pos);
line[--pos]=';';
sscanf(line, "%[^;]; %d %d ", name, &mileage, &years);

Upvotes: -1

Related Questions