Reputation: 339
I have a vector of strings loaded from a file via ifstream
, it does not need to be a vector but it looks like this:
std::vector<std::string> data;
data.push_back("CONSISTENTTEXT:variabletext1");
data.push_back("CONSISTENTTEXT:variabletext2");
// 100,000 + more push_back's
As it's a large vector and I need to loop through to find all references of a search string I'm wondering how to do the most efficient string case insensitive find given I can ignore the first 15 characters of the string?
Upvotes: 2
Views: 882
Reputation: 25337
The first thing to try is just the obvious:
std::copy_if(data.begin(), data.end(), output,
[&searchForMe](std::string const& str) {
return caseInsensitiveEquals(
std::string_view{str}.remove_prefix(15),
std::string_view{searchForMe}.remove_prefix(15));
});
See std::copy_if
and std::string_view::remove_prefix
You'd also probably just want to remove the prefixes before you even add it to data
. This would be a significant reduction in your memory usage and could potentially make your strings fit within the small-string-optimization size. Searching would then be really easy:
std::copy_if(data.begin(), data.end(), output,
[&searchForMe](std::string const& str) { // maybe `std::string_view str`
return caseInsensitiveEquals(str, searchForMe);
});
If this is not fast enough (likely it is not fast enough as your vector is large), there are some options. You could always reach for the Execution Policies, but they don't help reduce the amount of work that needs to be done.
The next thing I would consider is std::equal_range
, which would require sorting the data:
// assuming you removed all the prefixes
std::sort(data.begin(), data.end(), caseInsensitiveLessThan);
auto rangePair = std::equal_range(data.begin(), data.end(), searchForMe, caseInsensitiveLessThan);
After the O(n log n)
sort, the lookup is O(log n)
.
If this is still not fast enough, or you perhaps cannot pay for a sort, you need a specialized data structure. You might be able to get away with a std::unordered_multiset<std::string, CaseInsensitiveHash, CaseInsensitiveEquals>
.
Upvotes: 1