MichaelX
MichaelX

Reputation: 202

Find and replace a word in a file, how to avoid reading the entire file into a buffer?

I have an assignment where I'm supposed to write to a file, then perform a find and replace on it, with the condition that the old word must have the same length as the new one.

What I'm currently doing is finding the file size, then allocating a memory of that size and assign it to a buffer, read the entire file into the buffer, change the words, then write it back on the file.

This would fail if the files are too big, the only thing I can think of to avoid this is:

  1. Check if the buffer contains \n
  2. If it doesn't (the entire line wasn't read), then use realloc to increase its size by any amount (the original for example)
  3. Delete the last n characters in the buffer, where n is the length of the word we want to replace. (To avoid reading the same data again)
  4. Set the file pointer back by n. (Because the word could be cut)

Is there any other method? This feels complicated, and realloc causes some issues that might make the program need new buffers.

This is the current code where I read the entire file at once:

void replace_word(const char *s, const char *old_word, const char *new_word){
    FILE *original_file;

    if((original_file = fopen(s, "r+")) == NULL){
        perror(s);
        exit(EXIT_FAILURE);
    }

    const int BUFFER_SIZE = fsize(s);
    char *buffer = malloc(BUFFER_SIZE);
    char *init_loc = buffer;

    int word_len = strlen(old_word);
    int word_frequency = 0;

    fgets(buffer, BUFFER_SIZE, original_file);

    while((buffer = strstr(buffer, old_word))){
            memcpy(buffer, new_word, word_len);
            word_frequency++;
        }

    buffer = init_loc;
    rewind(original_file);
    fputs(buffer, original_file);

    printf("'%s' found %i times\n", old_word, word_frequency);

    fclose(original_file);

    free(buffer);

}

Upvotes: 1

Views: 98

Answers (2)

Manny_Mar
Manny_Mar

Reputation: 398

I don't know if this is the best solution or not, but i would just look at one word at a time. Then when you find the word you want to change, go back by the size of the word you read and overwrite it. As long as the word is the same size, it should work.

Use fgetc to get one char at a time from your file. Replace getchar with fgetc in the code below.

Just modify this code, to work with fgetc, it from K&R famous book on C, which i read 10 months ago, to learn C. I've used it a few times in my own code, and it works fine.

#include <stdio.h>
#include <ctype.h>

/* getword: get next word or character from input */
int getword(char *word, int lim)
{
    int c, getch(void);
    void ungetch(int);
    char *w = word;

    while (isspace(c = getch()))
        ;
    if (c != EOF)
        *w++ = c;
    if (!isalpha(c)) {
        *w = '\0';
        return c;
    }
    for ( ; --lim > 0; w++)
        if (!isalnum(*w = getch())) {
            ungetch(*w);
            break;
        }
    *w = '\0';
    return word[0];
}

#define BUFSIZE 100
char buf[BUFSIZE]; /* buffer for ungetch */
int bufp = 0;      /* next free position in buf */

int getch(void) /* get a (possibly pushed-back) character */
{
    return (bufp > 0) ? buf[--bufp] : getchar(); //change to fgetc  
}

void ungetch(int c)    /* push character back on input */
{
    if (bufp >= BUFSIZE)
        printf("ungetch: too many characters\n");
    else
        buf[bufp++] = c;
}

You can make the max size of the array anything you want, it's set to 100, since there should be no words bigger then 100 char, but you can make it anything.

just modify the code to read form fgetc, and end when you hit EOF.

Upvotes: 0

Mike Nakis
Mike Nakis

Reputation: 61993

You can do it with a "sliding window" algorithm using just one fixed buffer of any length that you want, as long as the buffer is longer than the word you are looking for. The pseudocode to search for a word of length N would look as follows:

  • Begin with a buffer full of data from the file.
  • Loop:
    • Search for the word in the buffer; if found:
      • calculate the offset of the word in the file
      • write the replacement over it.
    • move the last N - 1 characters from the end of the buffer to the beginning of the buffer. (That's because these characters may contain part of the word, and the remaining part may be in the beginning of the next buffer that you will read.)
    • fill the remainder of the buffer from the file.
    • repeat the above loop until you reach the end of the file.

For this to perform well, the buffer must be much longer than the word. So, if your word is up to 100 characters long, the buffer should be at least 4 kilobytes long. But 64 and even 128 kilobyte buffers work well in modern systems.

Do not forget to seek to the right offset before each read operation.

Upvotes: 2

Related Questions