Reputation:
Suppose I have a very long string, such as a filepath, and I want to search for something in it. For example, something like the $ find
command. It seems like a basic implementation of this would be along the lines of:
if(strstr(sent, word) != NULL) {
return 1;
}
Would there be any performance difference between doing that and something like Boyer Moore? Or does strstr
already do something just as efficient?
Basically, I have about a billion very long strings, and I'm looking to do a fast(ish) find on them (without any indexing), based on the most efficient substring implementation. What should I use?
Update: To give a more concrete example, let's say I have a billion filepaths I want to search through:
/archive/1002/myfile.txt
/archive/1002/newer.mov
/user/tom/local_2014version1.mov
And from this I would search either one or more strings. Example samples would be:
"1002" // would return the first two fileds
"mov version tom" // would return the first row
Upvotes: 0
Views: 283
Reputation: 13181
Advanced search algorithms like Boyer-Moore and Aho-Corasick work by precomputing lookup tables from the string(s) to be searched for, which incurs a large start-up time. It's very unlikely that searching something as small as a pathname would be able to make up for that high overhead. You really have to be searching something like multi-page documents before those algorithms show their value.
Upvotes: 2