kerma
kerma

Reputation: 13

Read from file word by word with spaces in C

The input data in the text file are as follow:

Mike     107        Europe             6798  
Steve    023        New York           2588
Adam     057        South Town         1902
Adam     124        Asia               5555

The data separated with whitespace

C code to read data from file is:

char name[30];
int code;
char local[30];
int id;


while (fscanf(fp, "%s %d %[^ ] %d", name, &code, local, &id) != EOF){

    printf("%s-%d-%s-%d\n", name, code, local, id);
}

The problem with my code is that always read one word at time for local the output of my print is:

Mike     107        Europe             6798  
Steve    023        New                2588
Adam     057        South              1902
Adam     124        Asia               5555

New York and South Town are truncated. How do I fix that ?

Upvotes: 0

Views: 571

Answers (2)

Clifford
Clifford

Reputation: 93476

Your data seems to want it both ways - to use space as a delimiter and also as content. It is at best an ambiguous design decision and CSV might be a better solution. However you can use probably assumptions about the data content that will at least in the example input solve the problem.

Assuming the local column cannot contain digit characters, then:

fscanf(fp, "%s %d %[^0123456789\t] %d", name, &code, local, &id)

I have included \t (TAB) to be sure that does not trip it up. The effect of this however is that the local includes all the trailing spaces. You could trim them thus:

for( int i = strlen(local) - 1; local[i] == ' '; i-- )
{
    local[i] = 0 ;
}

Alternatively you might read the entire line and process it such that a single space is not a delimiter and >1 space or TAB is. Or if you can guarantee no TABs you could extract the fields data by column number, but it all gets rather fiddly:

    while( fgets( line, sizeof(line), fp ) != NULL )
    {
        line[NAME_COL-1] = 0 ;
        line[CODE_COL-1] = 0 ;
        line[ID_COL-1] = 0 ;
        sscanf(&s[NAME_COL], "%s", name) ;
        sscanf(&s[CODE_COL], "%d", &code) ;
        sscanf(&s[LOCAL_COL], "%s", local) ;
        sscanf(&s[ID_COL], "%d", &id) ;
        for( int i = strlen(local) - 1; local[i] == ' '; i-- )
        {
            local[i] = 0 ;
        }
        printf("%s-%d-%s-%d\n", name, code, local, id);
    }

Upvotes: 1

KamilCuk
KamilCuk

Reputation: 140970

How do I fix that ?

You could read up until two consecutive spaces. You would have to implement such logic yourself, for example:

#include <stdio.h>
#include <string.h>

const char input[] =
    "Mike     107        Europe             6798  \n"
    "Steve    023        New York           2588\n"
    "Adam     057        South Town         1902\n"
    "Adam     124        Asia               5555\n"
    ;

int main() {
    FILE *in = fmemopen((char*)input, sizeof(input) - 1, "r");

    char name[30];
    int code;
    char local[30];
    int id;

    char line[250];
    int localstart;
    const char *localend;
    while (fgets(line, sizeof(line), in) != NULL) {
        if (sscanf(line, "%s%d %n", name, &code, &localstart) != 2) break;
        localend = strstr(line + localstart, "  ");
        if (localend == NULL) break;
        const size_t locallen = localend - (line + localstart);
        memcpy(local, line + localstart, locallen);
        local[locallen] = '\0';
        if (sscanf(line + locallen, "%d", &id) != 1) break;

        printf("%s-%d-%s-%d\n", name, code, local, id);
    }
}

Alternatively you could for example: read the whole line and inspect if from the back. The 3rd column starts after spaces after 2nd column and before spaces before 4th column.

#include <stdio.h>
#include <string.h>
#include <ctype.h>

const char input[] =
    "Mike     107        Europe             6798  \n"
    "Steve    023        New York           2588\n"
    "Adam     057        South Town         1902\n"
    "Adam     124        Asia               5555\n"
    ;

int main() {
    FILE *in = fmemopen((char*)input, sizeof(input) - 1, "r");

    char name[30];
    int code;
    char local[30];
    int id;

    char line[250];
    int localstart;
    while (fgets(line, sizeof(line), in) != NULL) {
        if (sscanf(line, "%s%d %n", name, &code, &localstart) != 2) break;
        const char *end = line + strlen(line) - 1;
        // ignore trailing spaces
        while (end != line && isspace(*end)) end--;
        // jump over last digit
        while (end != line && isdigit(*end)) end--;
        // scan the last digit
        if (sscanf(end + 1, "%d", &id) != 1) break;
        // rewind spaces
        while (end != line && isspace(*end)) end--;
        // at end + 1 ends 'local' column
        const char *localend = end + 1;
        const size_t locallen = localend - (line + localstart);
        memcpy(local, line + localstart, locallen);
        local[locallen] = '\0';

        printf("%s-%d-%s-%d\n", name, code, local, id);
    }
}

The fmemopen function is available on linux that I used for testing here - it may not be available on windows.

Upvotes: 1

Related Questions