roxycandigit
roxycandigit

Reputation: 1

Reading from a file and setting an offset?

I am new to C, and I am trying to build a C program that scans through a file until EOF, picks out lines that contain a certain keyword and then sets an offset after the last line was searched. When the scan is executed again, it scans the file, this time starting from the saved offset and continues downward until EOF.

I am trying to wrap my head around the different functions of File I/O and I'm having trouble piecing together the procedure to call fopen(), fseek(), fgets(), ftell(), etc to do what I want it to do. Can anyone point me in the right direction or walk me through what I need to get this done?

Thank you!

Upvotes: 0

Views: 6051

Answers (3)

MC93
MC93

Reputation: 799

I would recomment using getline for reading, and ftell and fseek for getting/setting the offset (and strstr for searching individual lines) in your case.

I'm not sure I understand what your saving of the offset is all about, but it might look like this:

int pick_lines(const char *filename, const char *keyword, long *offset)
{
    FILE *fp;
    char *line = NULL;
    size_t len = 0;

    if (offset == NULL || (fp = fopen(filename, "r")) == NULL)
        return 1;

    if (*offset > 0 && fseek(fp, *offset, SEEK_SET) != 0) {
        fclose(fp);
        return 1;
    }

    while (getline(&line, &len, fp) != -1) {
        if (strstr(line, keyword) != NULL)
            printf("%s", line); // or do something else with chosen line
    }

    if ((*offset = ftell(fp)) < 0) {
        free(line);
        fclose(fp);
        return 1;
    }

    free(line);
    fclose(fp);
    return 0;
}

Here offset is an in/out parameter. It's dereferenced value is used to seek to a given offset (start with *offset == 0) and is then reset to the new offset.

This function would just print every line that contains keyword. If you want to return an array of lines instead, a little extra work is needed.

An example of usage might be:

long offset = 0;
pick_lines(filename, keyword, &offset);
// append lines to file
pick_lines(filename, keyword, &offset);
// ...

Upvotes: 1

rdtsc
rdtsc

Reputation: 1074

It sounds like what you want to do is begin the file with a "header" which defines where the last result was found. This way, that information is written and stored in the file itself. An 8-digit hex value could be adequate for representing the offset in a file of size up to 4GB. Something like:

00000022<cr><lf>
Text...<cr><lf>
More text...<cr><lf>
~ <cr><lf>  <-- this '~' is whatever we're looking for
Other stuff...<cr><lf>

I'm making some assumptions here. First, this is on Windows, where text lines are terminated in <cr> and <lf> characters (0x0D and 0x0A respectively.) If Unix, it will be <lf> only. If Mac, it may be <cr> only, or any of the others. I counted them in this example. And this is assuming ANSI-style strings, which means 8-bit encoding (one character = one byte of data.) The same functionality can be achieved with Unicode or other string formats, just note that they may no longer be exactly one byte per character. (In Unicode, it's two bytes per character. So expect trouble if mixing Unicode and ANSI string operations.)

Here, the "header" value is 0x22 or 34 decimal, and if you count all of the characters starting from the beginning of the file, the '~' is reached at the 34th count. So the "header" points to where the last search result was found.

How this works is like this: Initially this header value was zero, so your code would read this and know that it hasn't been searched yet. Lets say the code scanned through the file, incrementing by one for each character, until it found the '~' character. Then it seeks back to the beginning, converts this count value into 8 text characters (itoa or sprintf), and overwrites this part of the file with it. One found, done, or process the whole thing again to search for more. Now the next time this file is processed, your code reads this header value, and converts it from text into an uint (atoi), seeks the file to this offset plus one (since we don't want to catch this one again), then starts scanning again.

The others here have some good examples of actual code to start experimenting with. Note that if you're looking for more than just a character, such as a word or series of digits, the scanning portion becomes slower and more complex. Complex scanning of "tokens" instead of simple characters or words is called lexicographical analysis and that is a whole other topic. Google Flex and Bison or YACC, etc.

Upvotes: 0

robin.koch
robin.koch

Reputation: 1223

You could do it like this (just pseudocode):

fopen();
offset = loadOffset();
fseek(offset); // set offset from previous run
while(!feof())
{
  fgets();
  if(searchKeyword() == true)
  {
    offset = ftell(); // getting the offset (after the line you just read)
    doSomething();

  }
}
saveOffset(offset);
fclose();

Hint: Be carefull with feof(); it returns true only if a input operation failed because of EOF. If the file pointer is at EOF but nothing failed before, it returns false. You have to handle that case.

Upvotes: 0

Related Questions