Master Maq
Master Maq

Reputation: 11

Generating array of word pointers in c

I have a problem where I have to read a text file made of 264064 words into a buffer and then create an array of word-pointers in a separate array. I am not sure how to go about creating the array of word-pointers which points to different amount of characters in the buffer. Any hints on how to approach this problem?

#include <stdlib.h>
#include <string.h>

int main()
{
    int i,wordCount=0;
    long bufsize;
    char ch;

    //Open File and get number of lines in file
    FILE *fp = fopen("words2.txt", "r");
    if (fp == NULL) {
        printf("Error!");
        exit(1);
    }
    do {
        ch = fgetc(fp);
        if (ch == '\n')
        {
            wordCount++;
        }

    } while (ch != EOF);
    fclose(fp);
    printf("%d\n",wordCount);

    //Reading Words into buffer rawtext
    char *rawtext;
    fp = fopen("words2.txt", "rb");

    if (fp != NULL)
    {
        if (fseek(fp, 0L, SEEK_END) == 0) {
            bufsize = ftell(fp);
            if (bufsize == -1) {
                exit(1);
            }
            rawtext = malloc(sizeof(char) * (bufsize + 1));

            if (fseek(fp, 0L, SEEK_SET) != 0) { exit(1); }

            size_t newLen = fread(rawtext, sizeof(char), bufsize, fp);
            if (ferror(fp) != 0) {
                fputs("Error reading file", stderr);
            } else {
                rawtext[newLen++] = '\0';
            }
        }
        //Print out buffer
        printf("%s",rawtext);
        fclose(fp);
        free(rawtext);//Free allocated memory

        char *ptr[wordCount];//Array for word-pointers
    }
}

Upvotes: 0

Views: 55

Answers (1)

Stephan Lechner
Stephan Lechner

Reputation: 35164

If you keep your rawtext (i.e. do not free it), you could use strchr('\n') to go through the content, store to the array the current position, detect every new line char, terminate the string at this new line character, and go ahead. Thereby, your ptr-array will point to each word inside rawtext at the end (that's why you should not free rawtext then, because the pointers would then point to invalid memory):

The following code should work:

char* currWord = rawtext;
int nrOfWords = 0;
char* newlinePos;
while ((newlinePos = strchr(currWord,'\n')) != NULL) {
  *newlinePos = '\0';
  ptr[nrOfWords++] = currWord;
  currWord = newlinePos + 1;
}
if (*currWord) {
  ptr[nrOfWords++] = currWord;
}

Side note: expression char *ptr[wordCount] might put your pointer array on the stack, which has limited space, at least less than the heap. This could get a problem if your file contains a lot of words. Use char *ptr = malloc((wordCount+1) * sizeof(char*)) to reserve memory on the heap. Note also the +1 after wordCount for the case that the last word is not terminated by a new line.

Upvotes: 1

Related Questions