Je Rog
Je Rog

Reputation: 5993

Given a FILE * , how to find the offset of the 1st occurence of "abc" efficiently?

How to do this kind of job efficiently in C?

What I can think of is first load the whole file into memory and then search though it..

But is there a more efficient way?

UPDATE

To load the whole file into memory will be impossible if the file is extremely big.

Upvotes: 1

Views: 197

Answers (4)

John Gordon
John Gordon

Reputation: 2704

Loading the whole file into memory is unnecessary and inefficient. Try something like this:

FILE *fl;
int cc = getc(fl);
while (cc != EOF)
{
   if (cc=='a')
   {
     cc = getc(fl);
     if (cc=='b')
     {
       cc = getc(fl);
       if (cc=='c')
          return "FOUND";
      }
    }
    cc = getc(fl);
  }
  return "NOT FOUND";

Obviously you would never actually use code like this. You should write a function that takes an arbitrary string to search for, but the algorithm is basically the same. Also the I/O will be buffered by the system so you shouldn't have to worry about efficiency of reading a single character at a time. Also I didn't include any error checking.

Upvotes: 2

Chris Eberle
Chris Eberle

Reputation: 48775

What OS are you using? If it's Linux you can use a memory map to automatically map a certain portion of memory directly to the file. It's considered much faster.

EDIT

mmap doesn't load the whole file into memory at once. It's just more efficient.

Upvotes: 0

Ben Jackson
Ben Jackson

Reputation: 93710

For string searches there are many interesting algorithms. For example in Boyer-Moore you would take advantage of the fact that the 3rd position must be 'c' if you are to match 'abc', and if it is not 'c' then a table would say how far to advance (for example if it's 'd' you can skip ahead 3 because the first 3 letters can't be interesting to you at all).

However, interesting string search methods are not going to matter at all versus the time spent reading the file. You should avoid reading it all in if you want to handle arbitrary files because the extra memory use is wasteful and will slow you down. But you can't avoid reading the entire file up to the point where you find your string.

Upvotes: 0

Foo Bah
Foo Bah

Reputation: 26251

you can read in the file block-by-block and search for "abc" in each block. There are algorithms like the Boyer-Moore search to reduce the number of characters you have to explicitly check.

in linux, you can use posix_fadvise to tell it that you will be slurping the file.

Upvotes: 2

Related Questions