Reputation: 11
I have a problem where I have to read a text file made of 264064 words into a buffer and then create an array of word-pointers in a separate array. I am not sure how to go about creating the array of word-pointers which points to different amount of characters in the buffer. Any hints on how to approach this problem?
#include <stdlib.h>
#include <string.h>
int main()
{
int i,wordCount=0;
long bufsize;
char ch;
//Open File and get number of lines in file
FILE *fp = fopen("words2.txt", "r");
if (fp == NULL) {
printf("Error!");
exit(1);
}
do {
ch = fgetc(fp);
if (ch == '\n')
{
wordCount++;
}
} while (ch != EOF);
fclose(fp);
printf("%d\n",wordCount);
//Reading Words into buffer rawtext
char *rawtext;
fp = fopen("words2.txt", "rb");
if (fp != NULL)
{
if (fseek(fp, 0L, SEEK_END) == 0) {
bufsize = ftell(fp);
if (bufsize == -1) {
exit(1);
}
rawtext = malloc(sizeof(char) * (bufsize + 1));
if (fseek(fp, 0L, SEEK_SET) != 0) { exit(1); }
size_t newLen = fread(rawtext, sizeof(char), bufsize, fp);
if (ferror(fp) != 0) {
fputs("Error reading file", stderr);
} else {
rawtext[newLen++] = '\0';
}
}
//Print out buffer
printf("%s",rawtext);
fclose(fp);
free(rawtext);//Free allocated memory
char *ptr[wordCount];//Array for word-pointers
}
}
Upvotes: 0
Views: 55
Reputation: 35164
If you keep your rawtext
(i.e. do not free it), you could use strchr('\n')
to go through the content, store to the array the current position, detect every new line char, terminate the string at this new line character, and go ahead. Thereby, your ptr
-array will point to each word inside rawtext
at the end (that's why you should not free rawtext
then, because the pointers would then point to invalid memory):
The following code should work:
char* currWord = rawtext;
int nrOfWords = 0;
char* newlinePos;
while ((newlinePos = strchr(currWord,'\n')) != NULL) {
*newlinePos = '\0';
ptr[nrOfWords++] = currWord;
currWord = newlinePos + 1;
}
if (*currWord) {
ptr[nrOfWords++] = currWord;
}
Side note: expression char *ptr[wordCount]
might put your pointer array on the stack, which has limited space, at least less than the heap. This could get a problem if your file contains a lot of words. Use char *ptr = malloc((wordCount+1) * sizeof(char*))
to reserve memory on the heap. Note also the +1
after wordCount for the case that the last word is not terminated by a new line.
Upvotes: 1