Reputation: 58
I was using the following code to append strings
string res = string.Empty;
int ite = 100000;
for(int i= 0; i < ite; i++)
{
res += "5";
}
It was taking a lot of time, so I later changed the code to
string res = string.Empty;
int ite = 100000;
res = getStr(ite / 2) + getStr(ite - (ite / 2));
//body of getStr method
private static string getStr(int p)
{
if (p == 1)
return "5";
else if (p == 0)
return string.Empty;
string r1 = getStr(p / 2); //recursive
string r2 = getStr(p - (p / 2)); //recursive
return (r1 + r2);
}
which in my opinion does nothing actually as the number of times strings are concatenated is approximately the same as in previous approach.
But using this approach there was significant improvement in performance as the code which was taking around 2500 ms (on my machine) was now taking 10 ms.
I ran a profiler over cpu time and couldn't arrive at an understanding as to why there is an improvement in performance. Can anyone please explain this.
Note: I am intentionally not using StringBuilder, in order to understand the above.
Upvotes: 4
Views: 993
Reputation: 1420
Assuming - as per Matt Burland's answer - that the time cost of creating a string of length n by one of the given algorithms is dominated by the number of characters copied, the observed timings may be largely explained by calculating this for both algorithms. This yields O(n2) and O(n log n) and, for length 10,000, a ratio of 348:1. The algorithm may be improved to O(n) in Java, but apparently not in .NET.
Inspection of the improved algorithm shows that the number c(n) of characters copied obeys the following recurrence relation:
c(0) = 0
c(1) = 1
c(n) = c(⌊n/2⌋) + c(⌈n/2⌉) + n
This may be solved to yield
c(2k + a) = (k + 1)2k + (k + 3)a
where k and a are chosen so that n = 2k + a , a < 2k ; this is readily verified by complete induction. This is O(k 2k), i.e. O(n log2n), i.e. O(n log n),
The original algorithm clearly copies n(n+1)/2 characters, and is thus O(n2).
The revised algorithm clearly copies far fewer characters; for the given 10,000 character string:
c(10000) =
c(213 + 1808) =
(13+1) * 8192 + 16 * 1808 =
143,616
while the original algorithm copies 50,005,000 characters, a ratio of approximately 1 : 348, consistent to well within an order of magnitude with the observed ratio of 1:250. The imperfect match does suggest that other factors such as memory management may be significant.
Given that the string is filled with a single character,
and assuming that String.Substring
does not make a copy of the string,
which is true in Java but not .NET according to comparison-of-substring-operation-performance-between-net-and-java ,
we can improve on the second algorithm (without using StringBuilder
or String('5', ite)
)
by continually doubling the constructed string, adding an extra character when necessary:
private static string getStr(int p)
{
if(p == 0)
return "";
if(p == 1)
return "5";
string s = getStr ((p+1) / 2);
if( s.Length + s.Length == p )
return s + s;
else
return s + s.Substring(0, p - s.Length);
}
For the number of characters c2(n) copied by this algorithm we have
c2(n) = n + c2(⌈n/2⌉)
from which we may derive
c2(n) = 2_n_ + d(n)
where d(n) is -1 if n is a power of 2 and is otherwise the number of “internal” (i.e. neither leading nor trailing) digits equal to 0 in the binary expansion of m; equivalently, d(n) is defined by the first matching case for m ∈ ℕ in:
d(2m) = -1
d(2 m) = d(m)
d(m) = the number of essential (non-leading) 0 binary digits in m
The expression for c2 may also be verified by complete induction and is O(n + log n), i.e. O(n).
It is fairly straightforward to remove the recursion from this algorithm.
In the OP’s case this algorithm copies c2(10,000) = 20,000 + d(110000110101000002) = 20,006 characters and would thus appear to be some 7 times faster.
"5"
.String('5', ite)
.StringBuilder
to build a string of known size, one may reduce allocations by using StringBuilder(capacity)
.'\0'
!), copy in the string to be repeated and then repeatedly append a copy of the filled part of the buffer until it is full.Upvotes: 1
Reputation: 45135
You need to think about why string concatenation is slow. Strings are immutable, so when you do:
someString+= "5";
You have to copy the entire contents of someString
and to another string that is one larger and then copy in the 5
part. If you think about it, this gets slower and slower the longer the string gets.
With your recursive function you are doing a divide and conquer strategy to help to minimize the number of big string concatenations you need. For example, if you had a length of 8, in the first case you'd be doing:
"5" + "5"
"55" + "5"
"555" + "5"
"5555" + "5"
"55555" + "5"
"555555" + "5"
"5555555" + "5" // 7 total concatenations
In your recursive model you are doing:
"5" + "5" // Four times
"55" + "55" // twice
"5555" + "5555" // once
So you are doing less big concatenations.
And, of course, I assume the OP knows this from their comment, but for anybody else; if you need to concatenate any non-trivial number of strings, use StringBuilder as it is optimized for building strings by Append
ing them together.
Upvotes: 15