Searching a substring in C

Question

Suppose I have a very long string, such as a filepath, and I want to search for something in it. For example, something like the $ find command. It seems like a basic implementation of this would be along the lines of:

if(strstr(sent, word) != NULL) {
    return 1;
}

Would there be any performance difference between doing that and something like Boyer Moore? Or does strstr already do something just as efficient?

Basically, I have about a billion very long strings, and I'm looking to do a fast(ish) find on them (without any indexing), based on the most efficient substring implementation. What should I use?

Update: To give a more concrete example, let's say I have a billion filepaths I want to search through:

/archive/1002/myfile.txt
/archive/1002/newer.mov
/user/tom/local_2014version1.mov

And from this I would search either one or more strings. Example samples would be:

"1002" // would return the first two fileds
"mov version tom" // would return the first row

Lee Daniel Crocker · Accepted Answer

Advanced search algorithms like Boyer-Moore and Aho-Corasick work by precomputing lookup tables from the string(s) to be searched for, which incurs a large start-up time. It's very unlikely that searching something as small as a pathname would be able to make up for that high overhead. You really have to be searching something like multi-page documents before those algorithms show their value.

Searching a substring in C

Answers (1)

Related Questions