Reputation: 202
I have an assignment where I'm supposed to write to a file, then perform a find and replace on it, with the condition that the old word must have the same length as the new one.
What I'm currently doing is finding the file size, then allocating a memory of that size and assign it to a buffer, read the entire file into the buffer, change the words, then write it back on the file.
This would fail if the files are too big, the only thing I can think of to avoid this is:
\n
realloc
to increase its size by any amount (the original for example)n
characters in the buffer, where n
is the length of the word we want to replace. (To avoid reading the same data again)n
. (Because the word could be cut)Is there any other method? This feels complicated, and realloc causes some issues that might make the program need new buffers.
This is the current code where I read the entire file at once:
void replace_word(const char *s, const char *old_word, const char *new_word){
FILE *original_file;
if((original_file = fopen(s, "r+")) == NULL){
perror(s);
exit(EXIT_FAILURE);
}
const int BUFFER_SIZE = fsize(s);
char *buffer = malloc(BUFFER_SIZE);
char *init_loc = buffer;
int word_len = strlen(old_word);
int word_frequency = 0;
fgets(buffer, BUFFER_SIZE, original_file);
while((buffer = strstr(buffer, old_word))){
memcpy(buffer, new_word, word_len);
word_frequency++;
}
buffer = init_loc;
rewind(original_file);
fputs(buffer, original_file);
printf("'%s' found %i times\n", old_word, word_frequency);
fclose(original_file);
free(buffer);
}
Upvotes: 1
Views: 98
Reputation: 398
I don't know if this is the best solution or not, but i would just look at one word at a time. Then when you find the word you want to change, go back by the size of the word you read and overwrite it. As long as the word is the same size, it should work.
Use fgetc
to get one char at a time from your file. Replace getchar
with fgetc
in the code below.
Just modify this code, to work with fgetc
, it from K&R famous book on C, which i read 10 months ago, to learn C. I've used it a few times in my own code, and it works fine.
#include <stdio.h>
#include <ctype.h>
/* getword: get next word or character from input */
int getword(char *word, int lim)
{
int c, getch(void);
void ungetch(int);
char *w = word;
while (isspace(c = getch()))
;
if (c != EOF)
*w++ = c;
if (!isalpha(c)) {
*w = '\0';
return c;
}
for ( ; --lim > 0; w++)
if (!isalnum(*w = getch())) {
ungetch(*w);
break;
}
*w = '\0';
return word[0];
}
#define BUFSIZE 100
char buf[BUFSIZE]; /* buffer for ungetch */
int bufp = 0; /* next free position in buf */
int getch(void) /* get a (possibly pushed-back) character */
{
return (bufp > 0) ? buf[--bufp] : getchar(); //change to fgetc
}
void ungetch(int c) /* push character back on input */
{
if (bufp >= BUFSIZE)
printf("ungetch: too many characters\n");
else
buf[bufp++] = c;
}
You can make the max size of the array anything you want, it's set to 100, since there should be no words bigger then 100 char, but you can make it anything.
just modify the code to read form fgetc, and end when you hit EOF.
Upvotes: 0
Reputation: 61993
You can do it with a "sliding window" algorithm using just one fixed buffer of any length that you want, as long as the buffer is longer than the word you are looking for. The pseudocode to search for a word of length N would look as follows:
For this to perform well, the buffer must be much longer than the word. So, if your word is up to 100 characters long, the buffer should be at least 4 kilobytes long. But 64 and even 128 kilobyte buffers work well in modern systems.
Do not forget to seek to the right offset before each read operation.
Upvotes: 2