mike
mike

Reputation: 787

C reading a text file separated by spaces with unbounded word size

I have a text file that contains words (strings) that are separated by spaces. The strings' size aren't bounded, nor is the number of words. What I need to do is to put all the words from the file in a list. (Assume the list works fine). I cannot figure out how to overcome the unbounded word size problem. I have tried this :

FILE* f1;
f1 = fopen("file1.txt", "rt");
int a = 1;

char c = fgetc(f1);
while (c != ' '){
    c = fgetc(f1);
    a = a + 1;
}
char * word = " ";
fgets(word, a, f1);
printf("%s", word);
fclose(f1);
getchar();

My text file looks like this:

 this is sparta

Notice that that all I was able to get was the first word, and even that I do improperly because I get the error:

Access violation writing location 0x00B36860.

Can someone please help me?

Upvotes: 2

Views: 5290

Answers (2)

Weather Vane
Weather Vane

Reputation: 34585

Taking suggestions from commenters above, this reallocates memory whenever there is not enough, or apparently just enough.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void fatal(char *msg) {
    printf("%s\n", msg);
    exit (1);
    }

int main() {
    FILE* f1 = NULL;
    char *word = NULL;
    size_t size = 2;
    long fpos = 0;
    char format [32];

    if ((f1 = fopen("file1.txt", "rt")) == NULL)        // open file
        fatal("Failed to open file");
    if ((word = malloc(size)) == NULL)                  // word memory
        fatal("Failed to allocate memory");
    sprintf (format, "%%%us", (unsigned)size-1);        // format for fscanf

    while(fscanf(f1, format, word) == 1) {
        while (strlen(word) >= size-1) {                // is buffer full?
            size *= 2;                                  // double buff size
            printf ("** doubling to %u **\n", (unsigned)size);
            if ((word = realloc(word, size)) == NULL)
                fatal("Failed to reallocate memory");
            sprintf (format, "%%%us", (unsigned)size-1);// new format spec
            fseek(f1, fpos, SEEK_SET);                  // re-read the line
            if (fscanf(f1, format, word) == 0)
                fatal("Failed to re-read file");
        }
        printf ("%s\n", word);
        fpos = ftell(f1);                               // mark file pos
    }

    free(word);
    fclose(f1);
    return(0);
}

Program input

this   is  sparta
help 30000000000000000000000000000000000000000
me

Program output:

** doubling to 4 **
** doubling to 8 **
this
is
sparta
help
** doubling to 16 **
** doubling to 32 **
** doubling to 64 **
30000000000000000000000000000000000000000
me

Upvotes: 3

Jonathan Leffler
Jonathan Leffler

Reputation: 753695

Which platform are you on?

If you're using a POSIX-ish platform, then consider using getline() to read lines of unbounded size, then one of strcspn(), strpbrk(), strtok_r(), or (if you are really determined to make your code not reusable) strtok() to get the boundaries of the words, and finally use strdup() to create copies of the words. The pointers returned by strdup() will be stored in an array of char * managed via realloc().

If you're not on a sufficiently POSIX-ish platform, then you'll need to use fgets() with checking to find whether you actually read a whole line — using realloc() to allocate more space if your initial line isn't long enough. Once you've got a line, you can then split it up as before.

You could mess around with POSIX getdelim() except it only takes a single delimiter and you probably want spaces and newlines to mark the ends of words (and possibly tabs too), which it won't handle.

And, again if you're on a sufficiently modern POSIX system, you can consider using the m modifier to scanf():

char *word = 0;

while (scanf("%ms", &word) == 1)
    …store word in your list…

This is even simpler when it is available.

Upvotes: 2

Related Questions