Salstrane
Salstrane

Reputation: 21

Reading a line from a file that includes integers

I need to read the following from a data file:

Sabre Corporation 15790 West Henness Lane New Corio, New Mexico 65790

My variables are char companyName[20+1], char companyAddress[30+1], char companyCity[15+1], char companyState[15+1], int companyZip;

I read the first line for companyName just fine using %[^\n], but trying to read the second line the same, the variable stays empty.

void getCompanyData(FILE *CompanyFile, char *companyName, char *companyAddress, char *companyCity, char *companyState, int *companyZip)
{
    fscanf(CompanyFile,"%[^\n]%[^\n]%s%s%d", companyName, companyAddress, companyCity, companyState, companyZip);
}

When I run this code and print out the variables, companyName looks fine, "Saber Corporation", but companyAddress isn't displayed as anything.

If I switch the second input to simply %s it reads the numbers of the address just fine.

Is there a way to read the whole line as one variable like the first line, instead of concatenating a number of other variables into a larger one?

Upvotes: 2

Views: 137

Answers (3)

David C. Rankin
David C. Rankin

Reputation: 84521

The biggest issue you have is identifying where the Street Address stops and the City begins in:

Sabre Corporation 15790 West Henness Lane New Corio, New Mexico 65790

As @chux mentions, you would expect to see additional commas or other delimiters separating the Corporation Name from the Street Address, and the Street Address from the City Name. However all is not lost, but it requires making an assumption that a Street Address with end with an identifiable word, like "boulevard", "drive", "lane", "street", etc... (as well as the abbreviations, e.g. "Blvd.", "Dr.", etc.. which you can add as needed) You then can create a simple look-up to compare individual words in the string to in order to identify where the Street Address ends and the City begins.

In C, there is nothing you cannot parse as long as you have rules to follows to allow you to locate the beginning and end of what you want out of a larger body of text. Here for purposes of the example, we will assume the corporation name does not contain numbers (you can add to the code to handle names that do as suggested in @chux's answer)

There are many ways to approach parsing the needed information from your line of input. For starters, where are you going to store it? Any time you have differing pieces of information you need to coordinate as one object, you should think struct (which then lends itself to an array-of-struct if you have a number of the objects). So in your case we could define a stuct (actually a typedef to a struct) similar to:

#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define MAXC 1024   /* if you need a constant, #define one (or more) */
#define ADDR   30
#define NAME   20
#define COMP   15

typedef struct {        /* struct to hold address parts */
    char name[NAME+1],
        addr[ADDR+1],
        city[COMP+1],
        state[COMP+1];
        int zip;
} co_t;

As discussed in my comment, a good approach is to read the entire line into a buffer with fgets allowing an independent validation of (1) the read itself, and (2) the parsing of items from the buffer. Additionally, when you read an entire line at a time, the entire line is consumed preparing your input buffer for the next read and you do not have to worry about a fscanf matching or input failure leaving a partially unread line in your input buffer. With that, you could read your line into a buffer with:

int main (int argc, char **argv) {

    char buf[MAXC], *p = buf, *ep = buf, tmp[MAXC];
    co_t company = { .name = "" };
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

    if (!fgets (buf, MAXC, fp)) {   /* read entire line into buf */
        fputs ("(user canceled or stream error)\n", stderr);
        return 1;
    }
    if (fp != stdin)    /* close file if not stdin */
        fclose (fp);

(note: above you declare your buffer buf and then two pointers to your buffer, and start-pointer p and end-pointer ep to help with parsing, along with another temporary buffer tmp to help as well. You declare an instance of your struct (initialized to all zeros) and validate you have your file open for reading before actually reading a line from the file)

To separate your corporation name from the rest, we can easily find the first digit in the buffer (with our assumption that your corporate name doesn't contain digits the first digit would be the beginning of the Street Address) and then backup to find the last alpha-character in the name (advance +1 to the first space following it so we bracket the name between the start of buf and our pointer). Then simply copy the name to company.name and nul-terminate it.

Since the digit that begins the address can be any digit, you can either work forward from the start checking with isdigit() or you can use the very-handy strpbrk() function to find the first digit for you returning a pointer to it, e.g.

    if ((p = strpbrk (p, "0123456789"))) {      /* locate 1st digit in buf */
        ep = p;                                 /* set endptr to pointer */
        while (ep > buf && !isalpha (*ep))      /* backup to find name end */
            ep--;
        ep++;                                   /* adv. to space after name */
        if (ep - buf <= NAME) {                 /* if name will fit */
            memcpy (company.name, buf, ep - buf);   /* copy to company.name */
            company.name[ep - buf + 1] = 0;         /* nul-terminate */
        }
    }
    else {  /* no number found, invalid input */
        fputs ("error: street number not found.\n", stderr);
        return 1;
    }

(note: above we save p so it continues to point to the beginning of the address, and use ep to backup to find the end of the name (there could be multiple spaces, etc.. between them) and we have then bracketed the name between buf and ep)

Once you have the wanted text bracketed, the difference in pointers (ep - buf) given you the number of characters (the length) of the string you need to copy to company.name, so you simply use memcpy as there is no nul-terminating character at the end of the name eliminating the use of strcpy as a possible solution. There is no reason to use strcpy anyway as you already know how many characters to copy, so the following is all you need:

            memcpy (company.name, buf, ep - buf);   /* copy to company.name */
            company.name[ep - buf + 1] = 0;         /* nul-terminate */

Recall, p was left pointing to the beginning of the Street Address -- but here we run into the problem of where does the address stop and the City start? Our next identifiable milestone in the string will be the ',' after the City. So lets use strchr to locate the comma, setting the end-pointer ep to point to the comma, and copy the entire Address and City to a temporary buffer tmp for further processing.

We need a temporary buffer because we will use strtok to split the temporary buffer into tokens (words), called "tokenizing" the string. The temporary buffer is needed because strtok modifies the buffer it acts on by replacing each series of delimiters with a nul-character as it parses words from the string. The idea here is to separate the tmp buffer into tokens and check word-by-word for the street ending, e.g. "boulevard", "drive", "lane", "street", etc... to locate the end of the Address. (note: all have been listed in lower-case to simplify the comparison).

So all we need do is take each token, convert it to lowercase and compare against each of the street endings to find the end of the address. A short function returning 1 if the token is an ending or 0 if it isn't is all you need, e.g.

/* simple function to look for word that is street ending */
int streetending (const char *s)
{
    char buf[MAXC], *p = buf;   /* temporary buf, convert s to lowercase */
    char *endings[] = { "boulevard", "drive", "lane", "street", NULL },
        **e = endings;  /* pointer to endings */

    strcpy (buf, s);    /* copy s to buf */

    while (*p) {        /* convert buf to all lowercase (for comparison) */
        *p = tolower (*p);
        p++;
    }

    while (*e) {        /* loop over street endings compare to buf */
        if (strcmp (buf, *e) == 0)  /* if match, return success */
            return 1;
        e++;            /* advance pointer to next ending */
    }

    return 0;   /* s not street ending, return failure */
}

With that helper function in place, we can separate the Address and City as follows:

    if ((ep = strchr (p, ','))) {   /* find ',' after city */
        memcpy (tmp, p, ep - p);    /* copy address & city to tmp */
        tmp[ep - p] = 0;            /* nul-terminate */
        /* split tmp into tokens checking for street ending ("Lane") */
        for (char *wp = strtok (tmp, " \n"); wp; wp = strtok (NULL, " \n")) {
            /* keep track of no. of chars added, check it fits -- here */
            strcat (company.addr, wp);      /* copy to address */
            if (streetending (wp)) {        /* check if street eding */
                wp += strlen (wp) + 1;      /* adv. past current word */
                while (!isalpha (*wp))      /* adv. to start of city */
                    wp++;
                strcpy (company.city, wp);  /* copy city to struct */
                break;  /* done */
            }
            strcat (company.addr, " "); /* not street ending, add space */
        }
    }

(note: we use a temporary word-pointer wp above leaving p pointing to the ',' after the City name so we can pick up there to separate the State from the Zip)

What you do next is to search forward in the remainder of the string starting with p looking for the next alpha-character that begins the State and then we can use the same find-the-first-digit logic we used above to locate the start of the Address, to locate the start of the Zip, e.g.

    while (!isalpha (*ep))  /* adv. endptr to start of state */
        ep++;
    p = ep;                 /* set pointer to start of state */

    if ((ep = strpbrk (ep, "0123456789"))) {    /* locate start of zip */
        char *zp = ep;                          /* set zip pointer */
        while (ep > p && !isalpha (*ep))        /* backup to end of state */
            ep--;
        ep++;
        if (ep - p <= COMP) {                   /* make sure city fits */
            memcpy (company.state, p, ep - p);  /* copy state to struct */
            company.state[ep - p + 1] = 0;      /* nul-terminate */
        }
        if (sscanf (zp, "%d", &company.zip) != 1) { /* convert zip to int */
            fputs ("error: invalid integer for zip.\n", stderr);
            return 1;
        }
    }

(note: another temporary zip-pointer zp was used to save a pointer to the start of the Zip before backing up to find the end of State)

That's basically all you need for a minimum separation of your line into the wanted parts. (it will be up to you to add to that logic if you have numbers in the corporate name and so on...) Putting this basic example altogether, you could do something like:

#include <stdio.h>
#include <string.h>
#include <ctype.h>

#define MAXC 1024   /* if you need a constant, #define one (or more) */
#define ADDR   30
#define NAME   20
#define COMP   15

typedef struct {        /* struct to hold address parts */
    char name[NAME+1],
        addr[ADDR+1],
        city[COMP+1],
        state[COMP+1];
        int zip;
} co_t;

/* simple function to look for word that is street ending */
int streetending (const char *s)
{
    char buf[MAXC], *p = buf;   /* temporary buf, convert s to lowercase */
    char *endings[] = { "boulevard", "drive", "lane", "street", NULL },
        **e = endings;  /* pointer to endings */

    strcpy (buf, s);    /* copy s to buf */

    while (*p) {        /* convert buf to all lowercase (for comparison) */
        *p = tolower (*p);
        p++;
    }

    while (*e) {        /* loop over street endings compare to buf */
        if (strcmp (buf, *e) == 0)  /* if match, return success */
            return 1;
        e++;            /* advance pointer to next ending */
    }

    return 0;   /* s not street ending, return failure */
}

int main (int argc, char **argv) {

    char buf[MAXC], *p = buf, *ep = buf, tmp[MAXC];
    co_t company = { .name = "" };
    /* use filename provided as 1st argument (stdin by default) */
    FILE *fp = argc > 1 ? fopen (argv[1], "r") : stdin;

    if (!fp) {  /* validate file open for reading */
        perror ("file open failed");
        return 1;
    }

    if (!fgets (buf, MAXC, fp)) {   /* read entire line into buf */
        fputs ("(user canceled or stream error)\n", stderr);
        return 1;
    }
    if (fp != stdin)    /* close file if not stdin */
        fclose (fp);

    if ((p = strpbrk (p, "0123456789"))) {      /* locate 1st digit in buf */
        ep = p;                                 /* set endptr to pointer */
        while (ep > buf && !isalpha (*ep))      /* backup to find name end */
            ep--;
        ep++;                                   /* adv. to space after name */
        if (ep - buf <= NAME) {                 /* if name will fit */
            memcpy (company.name, buf, ep - buf);   /* copy to company.name */
            company.name[ep - buf + 1] = 0;         /* nul-terminate */
        }
    }
    else {  /* no number found, invalid input */
        fputs ("error: street number not found.\n", stderr);
        return 1;
    }

    if ((ep = strchr (p, ','))) {   /* find ',' after city */
        memcpy (tmp, p, ep - p);    /* copy address & city to tmp */
        tmp[ep - p] = 0;            /* nul-terminate */
        /* split tmp into tokens checking for street ending ("Lane") */
        for (char *wp = strtok (tmp, " \n"); wp; wp = strtok (NULL, " \n")) {
            /* keep track of no. of chars added, check it fits -- here */
            strcat (company.addr, wp);      /* copy to address */
            if (streetending (wp)) {        /* check if street eding */
                wp += strlen (wp) + 1;      /* adv. past current word */
                while (!isalpha (*wp))      /* adv. to start of city */
                    wp++;
                strcpy (company.city, wp);  /* copy city to struct */
                break;  /* done */
            }
            strcat (company.addr, " "); /* not street ending, add space */
        }
    }

    while (!isalpha (*ep))  /* adv. endptr to start of state */
        ep++;
    p = ep;                 /* set pointer to start of state */

    if ((ep = strpbrk (ep, "0123456789"))) {    /* locate start of zip */
        char *zp = ep;                          /* set zip pointer */
        while (ep > p && !isalpha (*ep))        /* backup to end of state */
            ep--;
        ep++;
        if (ep - p <= COMP) {                   /* make sure city fits */
            memcpy (company.state, p, ep - p);  /* copy state to struct */
            company.state[ep - p + 1] = 0;      /* nul-terminate */
        }
        if (sscanf (zp, "%d", &company.zip) != 1) { /* convert zip to int */
            fputs ("error: invalid integer for zip.\n", stderr);
            return 1;
        }
    }
    /* if it all worked, your values should be nicely separated */
    printf ("'%s'\n'%s'\n'%s'\n'%s'\n%d\n", 
            company.name, company.addr, company.city, 
            company.state, company.zip);

    return 0;
}

Example Use/Output

Using your input line stored in the file dat/coaddress.txt and adding single-quotes around all string fields to mark the strings extracted, running the program on your input would provide:

$ ./bin/companynameaddr dat/coaddress.txt
'Sabre Corporation'
'15790 West Henness Lane'
'New Corio'
'New Mexico'
65790

Reading one line or a thousand lines is all the same. The only difference to the code would be to wrap the processing in a while (fgets (buf, MAXC, fp)) { ... loop, keep an index for your array-of-struct, and move the closing of the file to the end.

Look things over. There are many, many ways to do this. What we did was basically called "Walking the string" where you basically "inch-worm" a pair of pointers down the string to extract the wanted information. We used strpbrk and strchr to help advance the pointers, and we let strtok help separate a temporary buffer looking for the street ending word "Lane" to determine where Street ended and City began. You could do this at least 10 different ways. Let me know if you have further questions about what was done above.

Upvotes: 2

Loki Astari
Loki Astari

Reputation: 264331

A Little translation from C.

"%[^\n]%[^\n]%s%s%d"

%[^\n]        // reads everything up to the new line
              // But does not read the new line character.
              // So there is still a new  line character on the stream.

%[^\n]%[^\n]  // So the first one reads up to the new line.
              // The second one will immediately fail as there is a new line
              // still on the stream and thus not read anything. 

So:

int count = scanf(CompanyFile,"%[^\n]%[^\n]%s%s%d", /*Variables*/ );
printf("Count = %d\n", count);

Will print 1 as only one variable has been filled.

I know it is tempting to use the following to read a line.

 fscanf("%[^\n]\n", /* Variables*/ );

But that is a bad idea as it is hard to spot empty lines. An empty line will not read anything into the variable and thus fail before reading the new line so it effectively will not read the empty line. So best to break this into multiple statements.

 int count;
 do {
     count = fscanf("%[^\n]", /* Variables*/ );
     fscanf("\n");
 } while (count == 0);
 // successfully read the company name and moved on to next line
 // while skipping completely empty lines.

Now that seems logical extension of the above.
But that would not be the best way to do it. If you assume that a line may start with the '\n' from the previous line (and you want to ignore any leading white space on the data line) then you can use a space before.

 int count = fscanf(" %[^\n]", /* Variables*/ );
                  // ^ The leading space will drop all white space characters.
                  // this includes new lines so if you expect the last read may
                  // have left the new line on the stream this will drop it.

Another thing to note is that you should always check the return value of a fscanf() to make sure the number of variables you expect to have scanned was actually scanned.

Upvotes: 4

chux
chux

Reputation: 153338

Is there a way to read the whole line as one variable like the first line, instead of concatenating a number of other variables into a larger one?

When the goal is to read a line (all characters up to and including a '\n'), use fgets().

buffer[256];
if (fgets(buffer, sizeof buffer, CompanyFile) {
  // success!
}

After reading the line into a string, parse it.


"Sabre Corporation 15790 West Henness Lane New Corio, New Mexico 65790" is not clearly parse-able into companyName[], companyAddress[], companyCity[], companyState[], companyZip without additional rules.

I'd expect more commas.

Consider

3M 255 Century Ave N Maplewood, MN 55119 (company name with digits)
Southwest Airlines P.O. Box 36647 Dallas, Texas 75235 (No street number, PO Box)

Upvotes: 2

Related Questions