Reputation: 631
I have some text stored in a string. Whenever I see a specific character sequence, I want to insert some characters right after the pattern (shifting all existing characters in the string to later / higher indexes in the string). I'm thinking that the most efficient way to do this is to reserve a large character array (large because I don't know exactly how many inserts will be needed, but I do know that the total number of characters added will be less than the length of the original string) and then iterate over the original string, copy the characters into the new character array and then whenever the character pattern is recognized, insert the new string, and then continue copying characters from the source / original string. Can anyone think of a faster or better approach? This will be done frequently so I want to optimize it as much as practical.
Update: A couple people recommended going the std::string route instead of character array to avoid memory management associated with character array.
The pattern that I am looking for is a 5 character string then I keep looking until I see the newline character and then at that point append 3 or 5 characters. I would implement it by doing something like this:
bool matchedstart = false;
std::string origstr;
unsigned int strlength = origstr.length();
int strlengthm5 = origstr.length() - 5;
for(int i = 0, j = 0; i < strlength; i++, j++) {
if(!matchedstart && i < strlengthm5) {
if(origstr[i] == 't' && origstr[i+1] == 'n' && origstr[i+2] = 'a'...) {
matchedstart = true;
}
}
else if(origstr[i] == '\n') {
//append extra text here
matchedstart = false;
}
outputstr[j] = origstr[i];
}
Is that algorithm more efficient than string.find()? I suspect that it is because I have hard-coded my input text into the algorithm above. I suspect that string.find() would involve a short inner for-loop proportional to the length of the string although maybe that won't save much time over the compiler-optimized short-circuit evaluation involved in my if-chain. I guess I'll have to profile this to see how much overhead is involved with string. I'll post my findings later.
Upvotes: 1
Views: 2490
Reputation: 595782
You can use std::string
, which has find()
and insert()
methods, eg:
std::string str = "whatever you want to search in...";
std::string seq = "what to find";
auto pos = str.find(seq);
if (pos != std::string::npos)
str.insert(pos + seq.length(), "what to insert");
If you want to replace multiple instances of the sequence, find()
has an optional pos
parameter to specify the starting index to search from:
std::string str = "whatever you want to search in...";
std::string seq = "what to find";
std::string ins = "what to insert";
auto pos = str.find(seq);
while (pos != std::string::npos)
{
pos += seq.length();
str.insert(pos, ins);
pos = str.find(seq, pos + ins.length());
}
Since you say that you "know that the total number of characters added will be less than the length of the original string", you can use std:string::reserve()
to increase the string's capacity to avoid re-allocations during the inserts:
std::string str = "whatever you want to search in...";
std::string seq = "what to find";
std::string ins = "what to insert";
auto pos = str.find(seq);
if (pos != std::string::npos)
{
str.reserve(str.length() * 2);
do
{
pos += seq.length();
str.insert(pos, ins);
pos = str.find(seq, pos + ins.length());
}
while (pos != std::string::npos);
str.shrink_to_fit();
}
update: if insert()
proves to be too slow, you might consider building up a second std::string
so you don't waste time shifting characters around in the original std::string
, eg:
std::string str = "whatever you want to search in...";
std::string seq = "what to find";
std::string ins = "what to insert";
std::string newStr;
auto foundPos = str.find(seq);
if (foundPos == std::string::npos)
{
newStr = str;
}
else
{
newStr.reserve(str.length() * 2);
decltype(foundPos) startPos = 0;
auto ptr = str.c_str();
do
{
foundPos += seq.length();
newStr.append(ptr + startPos, foundPos - startPos);
newStr.append(ins);
startPos = foundPos;
foundPos = str.find(seq, startPos);
}
while (foundPos != std::string::npos);
newStr.append(ptr + startPos, str.length() - startPos);
}
Upvotes: 5
Reputation: 4654
First of all, use std::string
instead of torturing yourself with character arrays.
Your approach is pretty good, and the only way I can think of to optimize it will be the part searching for the pattern. What you're describing now seems to use a naive string search, where you try to match the pattern at every position. This takes O(nm)
, but there are algorithms that can do it faster.
You should use std::string::find
, which should provide a pretty efficient algorithm to do it in something like O(n + m)
, although the standard doesn't guarantee it.
Upvotes: 1