Reputation: 1
I am new to C, and I am trying to build a C program that scans through a file until EOF, picks out lines that contain a certain keyword and then sets an offset after the last line was searched. When the scan is executed again, it scans the file, this time starting from the saved offset and continues downward until EOF.
I am trying to wrap my head around the different functions of File I/O and I'm having trouble piecing together the procedure to call fopen(), fseek(), fgets(), ftell(), etc to do what I want it to do. Can anyone point me in the right direction or walk me through what I need to get this done?
Thank you!
Upvotes: 0
Views: 6051
Reputation: 799
I would recomment using getline
for reading, and ftell
and fseek
for getting/setting the offset (and strstr
for searching individual lines) in your case.
I'm not sure I understand what your saving of the offset is all about, but it might look like this:
int pick_lines(const char *filename, const char *keyword, long *offset)
{
FILE *fp;
char *line = NULL;
size_t len = 0;
if (offset == NULL || (fp = fopen(filename, "r")) == NULL)
return 1;
if (*offset > 0 && fseek(fp, *offset, SEEK_SET) != 0) {
fclose(fp);
return 1;
}
while (getline(&line, &len, fp) != -1) {
if (strstr(line, keyword) != NULL)
printf("%s", line); // or do something else with chosen line
}
if ((*offset = ftell(fp)) < 0) {
free(line);
fclose(fp);
return 1;
}
free(line);
fclose(fp);
return 0;
}
Here offset
is an in/out parameter. It's dereferenced value is used to seek to a given offset (start with *offset == 0
) and is then reset to the new offset.
This function would just print every line that contains keyword
. If you want to return an array of lines instead, a little extra work is needed.
An example of usage might be:
long offset = 0;
pick_lines(filename, keyword, &offset);
// append lines to file
pick_lines(filename, keyword, &offset);
// ...
Upvotes: 1
Reputation: 1074
It sounds like what you want to do is begin the file with a "header" which defines where the last result was found. This way, that information is written and stored in the file itself. An 8-digit hex value could be adequate for representing the offset in a file of size up to 4GB. Something like:
00000022<cr><lf>
Text...<cr><lf>
More text...<cr><lf>
~ <cr><lf> <-- this '~' is whatever we're looking for
Other stuff...<cr><lf>
I'm making some assumptions here. First, this is on Windows, where text lines are terminated in <cr>
and <lf>
characters (0x0D and 0x0A respectively.) If Unix, it will be <lf>
only. If Mac, it may be <cr>
only, or any of the others. I counted them in this example. And this is assuming ANSI-style strings, which means 8-bit encoding (one character = one byte of data.) The same functionality can be achieved with Unicode or other string formats, just note that they may no longer be exactly one byte per character. (In Unicode, it's two bytes per character. So expect trouble if mixing Unicode and ANSI string operations.)
Here, the "header" value is 0x22 or 34 decimal, and if you count all of the characters starting from the beginning of the file, the '~' is reached at the 34th count. So the "header" points to where the last search result was found.
How this works is like this: Initially this header value was zero, so your code would read this and know that it hasn't been searched yet. Lets say the code scanned through the file, incrementing by one for each character, until it found the '~' character. Then it seeks back to the beginning, converts this count value into 8 text characters (itoa
or sprintf
), and overwrites this part of the file with it. One found, done, or process the whole thing again to search for more. Now the next time this file is processed, your code reads this header value, and converts it from text into an uint
(atoi
), seeks the file to this offset plus one (since we don't want to catch this one again), then starts scanning again.
The others here have some good examples of actual code to start experimenting with. Note that if you're looking for more than just a character, such as a word or series of digits, the scanning portion becomes slower and more complex. Complex scanning of "tokens" instead of simple characters or words is called lexicographical analysis and that is a whole other topic. Google Flex and Bison
or YACC
, etc.
Upvotes: 0
Reputation: 1223
You could do it like this (just pseudocode):
fopen();
offset = loadOffset();
fseek(offset); // set offset from previous run
while(!feof())
{
fgets();
if(searchKeyword() == true)
{
offset = ftell(); // getting the offset (after the line you just read)
doSomething();
}
}
saveOffset(offset);
fclose();
Hint: Be carefull with feof(); it returns true only if a input operation failed because of EOF. If the file pointer is at EOF but nothing failed before, it returns false. You have to handle that case.
Upvotes: 0