Reputation: 1657
I am trying to analysis time complexity of below function. This function is used to check if a string is made of other strings.
set<string> s; // s has been initialized and stores all the strings
bool fun(string word) {
int len = word.size();
// something else that can also return true or false with O(1) complexity
for (int i=1; i<=len; ++i) {
string prefix = word.substr(0,i);
string suffix = word.substr(i);
if (prefix in s && fun(suffix))
return true;
else
return false;
}
}
I think the time complexity is O(n)
where n is the length of word (am I right?). But as the recursion is inside the loop, I don't know how to prove it.
Edit:
This code is not a correct C++
code (e.g., prefix in s
). I just show the idea of this function, and want to know how to analysis its time complexity
Upvotes: 0
Views: 2819
Reputation: 234795
The way to analyze this is by developing a recursion relationship based on the length of the input and the (unknown) probability that a prefix is in s
. Let's assume that the probability of a prefix being in s
is given by some function pr(L) of the length L of the prefix. Let the complexity (number of operations) be given by T(len).
If len == 0 (word
is the empty string), then T = 1. (The function is missing a final return
statement after the loop, but we're assuming that the actual code is only a sketch of the idea, not what's actually executing).
For each loop iteration, denote the loop body complexity by T(len; i). If the prefix is not in s
, then the body has constant complexity (T(len; i) = 1). This event has probability 1 - pr(i).
If the prefix is in s
, then the function returns true
or false
according to the recursive call to fun(suffix)
, which has complexity T(len - i). This event has probability pr(i).
So for each value of i
, the loop body complexity is:
T(len; i) = 1 * (1 - pr(i)) + T(len - i) * pr(i)
Finally (and this depends on the intended logic, not the posted code), we have
T(len) = sum i=1...len(T(len; i))
For simplicity, let's treat pr(i) as a constant function with value 0.5. Then the recursive relationship for T(len) is (up to a constant factor, which is unimportant for O() calculations):
T(len) = sum i=1...len(1 + T(len - i)) = len + sum i=0...len-1(T(i))
As noted above, the boundary condition is T(0) = 1. This can be solved by standard recursive function methods. Let's look at the first few terms:
len T(len)
0 1
1 1 + 1 = 2
2 2 + 2 + 1 = 5
3 3 + (4 + 2 + 1) = 11
4 4 + (11 + 5 + 2 + 1) = 23
5 5 + (23 + 11 + 5 + 2 + 1) = 47
The pattern is clearly T(len) = 2 * T(len - 1) + 1. This corresponds to exponential complexity:
T(n) = O(2n)
Of course, this result depends on the assumption we made about pr(i). (For instance, if pr(i) = 0 for all i, then T(n) = O(1). There would also be non-exponential growth if pr(i) had a maximum prefix length—pr(i) = 0 for all i > M for some M.) The assumption that pr(i) is independent of i is probably unrealistic, but this really depends on how s
is populated.
Upvotes: 4
Reputation: 4508
Assuming that you've fixed the bugs others have noted, then the i
values are the places that the string is being split (each i
is the leftmost splitpoint, and then you recurse on everything to the right of i
). This means that if you were to unwind the recursion, you are looking at up to n-1
different split points, and asking if each substring is a valid word. Things are ok if the beginning of word
doesn't have a lot of elements from your set, since then you can skip the recursion. But in the worst case, prefix in s
is always true, and you try every possible subset of the n-1
split points. This gives 2^{n-1}
different splitting sets, multiplied by the length of each such set.
Upvotes: 0