Reputation: 75
I have an input .txt file that looks like this:
Robert Hill 53000 5
Amanda Trapp 89000 3
Jonathan Nguyen 93000 3
Mary Lou Gilley 17000 1 // Note that came contains of 3 parts!
Warren Rexroad 72000 7
I need to read those lines and parse them into three different categories: name (which is an array of chars), mileage (int) and years(int).
sscanf(line, "%[^] %d %d ", name, &mileage, &years);
This doesn't work very well for me, any suggestions?
Upvotes: 2
Views: 2094
Reputation: 140639
You have discovered one of the three reasons *scanf
should never be used: it's almost impossible to write a format specification that handles nontrivial input syntax, especially if you have to worry about recovering from malformed input. But there are two even more important reasons:
%[...]
construct, are just as happy to overflow buffers as the infamous gets
.The correct way to parse lines like these is to scan for the first digit with strcspn("0123456789", line)
, or while (*p && !isdigit(*p)) p++;
, then use strtoul
to convert the numbers that follow.
Upvotes: 0
Reputation: 63807
THE PROBLEM
The problem with the current specifier passed to sscanf
is that it is both ill-formed, and even when fixed it won't do what you want. If you would have used [^ ]
as the first conversion specifier, sscanf
would try to read as many characters as it can before hitting a space.
If we assume that a name can't contain digits specifying [^0123456789]
will read the correct data, but it will also include the trailing space after the name, but before the first mileage entry. This is however easily solved by replacing the last space with a null-byte in name
.
To get the number of characters read into name
we can use the %n
specifier to denote that we'd sscanf
to store the number of bytes read into our matching argument; we can later use this value to correctly "trim" our buffer.
We should also specify a maximum width of the characters read by %[^0123456789]
so that it doesn't cause a buffer-overflow, this is done by specifying the size of our buffer directly after our %
.
SAMPLE IMPLEMENTATION
#include <stdio.h>
#include <string.h>
int
main (int argc, char *argv[])
{
char const * line = "Mary Lou Gilley 17000 1";
char name[255];
int mileage, years, name_length;
sscanf(line, "%254[^0123456789]%n %d %d ", name, &name_length, &mileage, &years);
name[name_length-1] = '\0';
printf ("data: '%s', %d, %d", name, mileage, years);
return 0;
}
data: 'Mary Lou Gilley', 17000, 1
Upvotes: 3
Reputation: 5373
If you have a function that finds the positon of the first digit like so:
// This function returns the position of the
// space before the first digit (assuming that
// the names dont contain digits)...
char *digitPos(char *s){
if isdigit(*(s+1)) return s;
else return digitPos(s+1);
}
You can then just separate the two variables by inserting a '\0'
at the right position like so:
pos = digitPos(line); // This is a pointer to the space
*pos = '\0';
strcpy(name, line);
sscanf(pos + 1, "%d %d", &mileage, &years);
Upvotes: 1
Reputation: 16304
This might help you get started. It lacks the intelligence of BLUEPIXY's solution which handles the trailing whitespace a little better than mine ( or you could chop it off yourself).
dan@rachel ~ $ echogcc -o t t.c
dan@rachel ~ $ echo "Dan P F 3 21" | ./t
Name: Dan P F ,
Mileage: 3,
Years: 21.
Here's the code.
#include <stdio.h>
#include <string.h>
int main(){
char *buf;
int mileage, years;
while(!feof(stdin) ){
if( fscanf( stdin, "%m[^0-9] %d %d", &buf, &mileage, &years) == 3 ){
fprintf(stderr, "Name:\t %s,\nMileage:\t %d,\nYears:\t %d.\n",
buf, mileage, years
);
}
}
}
Upvotes: 0
Reputation: 40145
int pos;
sscanf(line, "%*[^0-9]%n", &pos);
line[--pos]=';';
sscanf(line, "%[^;]; %d %d ", name, &mileage, &years);
Upvotes: -1