stringalgorithmrecursionbig-otime-complexity

Reputation: 413

Time Complexity of String Subsequence Recursion

How to calculate the time complexity of the algorithm below? Can someone please explain to me briefly:

public static void print(String prefix, String remaining, int k) {
    if (k == 0) {
        StdOut.println(prefix);
        return;
    }
    if (remaining.length() == 0) return;
    print(prefix + remaining.charAt(0), remaining.substring(1), k-1);
    print(prefix, remaining.substring(1), k);
}


public static void main(String[] args) { 
    String s = "abcdef";
    int k = 3;
    print("", s, k);
}

Upvotes: 1

Answers (3)

Kristjan Kiolein

Reputation: 103

Introduction

As the body is O(1), or at least can be rewritten as O(1), we only have to look for how many time the function is called. So the time complexity of this algorithm will be how many times the function will be called in relation to length of input word and length of output prefix.
n - length of input word
k - length of prefix being searched for

I have never done something like this before and common methods for finding time complexity for recursive methods that I know of don't work in this case. I started off by looking how many calls to the function were made depending on n and k to see if I can spot any patterns that might help me.

Gathering data

Using this code snippet (sorry for ugly code):

public static String word = "abcdefghij";
public static int wordLength = word.length();
public static int limit = 10;
public static int access = 0;

System.out.printf("Word length      : %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d %6d\n",0,1,2,3,4,5,6,7,8,9,10);
System.out.printf("-----------------------------------------------------------------------------------------------\n");
for(int k = 0; k <= limit; k++) {
    System.out.printf("k : %2d - results :", k);
        for(int i = 0; i <= limit; i++) {
        print("", word.substring(0,i), k);
        System.out.printf(", %5d", access);
        access=0;
    }
        System.out.print("\n");
}
print(prefix, remaining, k) {
    access++;
    ... rest of code...
}

From this I got :

Word length      :      0      1      2      3      4      5      6      7      8      9     10
-----------------------------------------------------------------------------------------------
k :  0 - results :,     1,     1,     1,     1,     1,     1,     1,     1,     1,     1,     1
k :  1 - results :,     1,     3,     5,     7,     9,    11,    13,    15,    17,    19,    21
k :  2 - results :,     1,     3,     7,    13,    21,    31,    43,    57,    73,    91,   111
k :  3 - results :,     1,     3,     7,    15,    29,    51,    83,   127,   185,   259,   351
k :  4 - results :,     1,     3,     7,    15,    31,    61,   113,   197,   325,   511,   771
k :  5 - results :,     1,     3,     7,    15,    31,    63,   125,   239,   437,   763,  1275
k :  6 - results :,     1,     3,     7,    15,    31,    63,   127,   253,   493,   931,  1695
k :  7 - results :,     1,     3,     7,    15,    31,    63,   127,   255,   509,  1003,  1935
k :  8 - results :,     1,     3,     7,    15,    31,    63,   127,   255,   511,  1021,  2025
k :  9 - results :,     1,     3,     7,    15,    31,    63,   127,   255,   511,  1023,  2045
k : 10 - results :,     1,     3,     7,    15,    31,    63,   127,   255,   511,  1023,  2047

At least one call is made in every case as it is the initial call. From here we see that everything on and below main diagonal is called 2^{min(n, k) + 1} - 1 times, like mentioned by others. That is number of nodes in binary tree. Each time a the print method is called, two new ones will be called - making a binary tree.

Things get more confusing above the main diagonal though and I failed to see any common pattern.

Graph representation of algorithm

To make it more visual I used graphviz (online version).

Here is code snippet that generates code for given n and k for graphviz (green nodes are ones from where solution was found):

public static String word = "abcde";
public static int wordLength = word.length();
public static int limit = 3;

public static void main(String[] args) {

    String rootNode = "\"prefix|remaining|k\"";
    StringBuilder graph = new StringBuilder("digraph G { \n node [style=filled];");
    print("", word, limit, graph, rootNode);
    graph.append("\n\"prefix|remaining|k\" [shape=Mdiamond];\n}");
    System.out.println(graph);
}

public static void print(String prefix, String remaining, int k, StringBuilder sb, String parent) {

    String currentNode =  "\"" + prefix + "|" + (remaining.isEmpty() ? 0 : remaining) +  "|" + k + "\"";
    sb.append("\n " + parent + "->" + currentNode + ";");

    if(k == 0) {
        sb.append("\n " + currentNode + "[color=darkolivegreen3];");
        return;
    }

    if (remaining.length() == 0)return;

    print(prefix + remaining.charAt(0), remaining.substring(1), k - 1, sb,currentNode);
    print(prefix, remaining.substring(1), k, sb, currentNode);
}

Example of graph (n=5, k=3):

digraph G { 
 node [style=filled];
 "prefix|remaining|k"->"|abcde|3";
 "|abcde|3"->"a|bcde|2";
 "a|bcde|2"->"ab|cde|1";
 "ab|cde|1"->"abc|de|0";
 "abc|de|0"[color=darkolivegreen3];
 "ab|cde|1"->"ab|de|1";
 "ab|de|1"->"abd|e|0";
 "abd|e|0"[color=darkolivegreen3];
 "ab|de|1"->"ab|e|1";
 "ab|e|1"->"abe|0|0";
 "abe|0|0"[color=darkolivegreen3];
 "ab|e|1"->"ab|0|1";
 "a|bcde|2"->"a|cde|2";
 "a|cde|2"->"ac|de|1";
 "ac|de|1"->"acd|e|0";
 "acd|e|0"[color=darkolivegreen3];
 "ac|de|1"->"ac|e|1";
 "ac|e|1"->"ace|0|0";
 "ace|0|0"[color=darkolivegreen3];
 "ac|e|1"->"ac|0|1";
 "a|cde|2"->"a|de|2";
 "a|de|2"->"ad|e|1";
 "ad|e|1"->"ade|0|0";
 "ade|0|0"[color=darkolivegreen3];
 "ad|e|1"->"ad|0|1";
 "a|de|2"->"a|e|2";
 "a|e|2"->"ae|0|1";
 "a|e|2"->"a|0|2";
 "|abcde|3"->"|bcde|3";
 "|bcde|3"->"b|cde|2";
 "b|cde|2"->"bc|de|1";
 "bc|de|1"->"bcd|e|0";
 "bcd|e|0"[color=darkolivegreen3];
 "bc|de|1"->"bc|e|1";
 "bc|e|1"->"bce|0|0";
 "bce|0|0"[color=darkolivegreen3];
 "bc|e|1"->"bc|0|1";
 "b|cde|2"->"b|de|2";
 "b|de|2"->"bd|e|1";
 "bd|e|1"->"bde|0|0";
 "bde|0|0"[color=darkolivegreen3];
 "bd|e|1"->"bd|0|1";
 "b|de|2"->"b|e|2";
 "b|e|2"->"be|0|1";
 "b|e|2"->"b|0|2";
 "|bcde|3"->"|cde|3";
 "|cde|3"->"c|de|2";
 "c|de|2"->"cd|e|1";
 "cd|e|1"->"cde|0|0";
 "cde|0|0"[color=darkolivegreen3];
 "cd|e|1"->"cd|0|1";
 "c|de|2"->"c|e|2";
 "c|e|2"->"ce|0|1";
 "c|e|2"->"c|0|2";
 "|cde|3"->"|de|3";
 "|de|3"->"d|e|2";
 "d|e|2"->"de|0|1";
 "d|e|2"->"d|0|2";
 "|de|3"->"|e|3";
 "|e|3"->"e|0|2";
 "|e|3"->"|0|3";
"prefix|remaining|k" [shape=Mdiamond];
}

Number of nodes cut from binary tree

From the example where n = 5 and k = 3 we can see that tree of height 3 and three trees of height 2 were cut off. As we visited each of these trees root nodes we get number of nodes cut from complete binary tree to be 1*(2³ - 2) + 3*(2² - 2) = 12
If it were a full binary tree : 2^{5 + 1} - 1 = 63
Number of nodes(calls made to function "print") then comes to 63 - 12 = 51
Result matches one we got by calculating number of calls made to function when n = 5 and k = 3.

Now we have to find how many and how big parts of tree are cut off for every n and k.

From here on I will refer to method call print(prefix + remaining.charAt(0), remaining.substring(1), k-1); as left path or left node (as it is in graphviz graphs) and to print(prefix, remaining.substring(1), k); as right path or right node.

We can see the first, and biggest tree is cut when we go left k times and the height of the tree will be n - k + 1. ( + 1 because we visit the root of the tree we cut).

We can see that every time we take left path k times we get to result no matter how many right paths we took before (or in what order). This is unless the word runs out of letters before we get the k left paths. So we can make maximum of n - k right turns.

Lets take a closer look at example where n = 5 and k = 3:

L - left path
R - right path

The first tree cut we took the paths :
LLL

The next highest trees that will be cut will be ones where we take only one right node, possible combinations are :
RLLL, LRLL, LLRL, LLLR -> three trees of height 2 cut
Here we must note that LLLR is already cut as LLL gave solution in previous step.

To get number of next trees (height 1 -> 0 nodes cut) we'll calculate the possible combinations of two rights and three lefts subtracting already visited paths. Combinations(5,3) - Combinations(4,3) = 10 - 4 = 6 nodes of height 1

We can see the numbers match green nodes on the example graph.

C(n,k) - combinations of k from n
f(n,k) - number binary tree nodes not visited by algorithm

f(n,k) = (2^n-k+1-2) + Σ^n-k_i=1(2^n-k-i+1-2)(C(k+i,k) - C(k+i-1,k))

Explanation:

(2^n-k+1-2) - the highest tree cut, have to bring it out of summation or we'll have to take negative factorials
Σ^n-k_i=1 - sum of all nodes cut, excluding highest tree as it is already added. (We start adding larger trees first)
(2^n-k-i+1-2) - number of nodes cut per tree. n-k+1 is the largest tree, then working down from there up to tree with height "n-k-(n-k)+1 = 1"
(C(k+i,k) - C(k+i-1,k)) - find how many trees of given height there is. First find all possible paths (lefts and rights) and then subtract already visited ones(in previous steps).

Looks awful, but it can be simplified if we assume that k != 0 (If we don't assume that there will be factorials of negative numbers - which is undefined)

Simplified function:
f(n,k) = Σ^n-k_i=0(2^n-k-i+1-2)*C(k+i-1,k-1)

Evaluation of accurate time complexity

The time complexity of the function:
O(2ⁿ- Σ^n-k_i=0(2^n-k-i+1-2)*C(k+i-1,k-1))

Now this looks awful and doesn't give much information. I don't know how to simplify it any further. I've asked about it here. No answer so far though.

But is it even worth considering the f(n,k) part? Probably depends on particular application where it is applied. We can see from the data table that it can considerably affect the algorithms calls depending on the choice of n and k.

To see more visually how much the extra part affects complexity I plotted best time complexity and real complexity on graph.

O(2ⁿ- Σ^n-k_i=0(2^n-k-i+1-2)*C(k+i-1,k-1)) is the colorful surface.
B(2^min(n,k)) is the green surface.

We can see that B(2^min(n,k)) overestimates (tells it works much better than it actually does) the function complexity by quite much. It is usually useful to look algorithms worst case complexity which is W(2^max(n,k))

O(2ⁿ- Σ^n-k_i=0(2^n-k-i+1-2)*C(k+i-1,k-1)) is the colorful surface.
B(2^min(n,k)) is the green surface.
W(2^max(n,k)) is the yellow surface.

Conclusion

Best case complexity: B(2^min(n,k))
Accurate complexity : O(2ⁿ- Σ^n-k_i=0(2^n-k-i+1-2)*C(k+i-1,k-1))
Worst case complexity: W(2^max(n,k)) -> often noted as O(2^max(n,k))

In my opinion worst case complexity should be used to evaluate the function as accurate is too complex to understand what it means without analyzing it further. I wouldn't use best case complexity because it leaves too much to chance. Unfortunately I can't calculate the average complexity for this. Depending on application of the algorithm, using average might be better for algorithm evaluation.

Upvotes: 4

Ami Tavory

Reputation: 76297

Suppose m is the length of prefix, and n is the length of remaining. Then the complexity is given by

T(m, n, k) = Θ(m + n) + T(m + 1, n - 1, k - 1) + T(m, n - 1, k).

The Θ(m + n) term stems from

prefix + remaining.charAt(0), remaining.substring(1)

which, in general will require creating two new strings of lengths about m and n, respectively (this might differ among various implementations).

Beyond that, it's pretty difficult to solve (at least for me), except for some very simple bounds. E.g., it's pretty clear that the complexity is at least exponential in the minimum of the length of the prefix and k, since

T(m, n, k) ≥ 2 T(m, n - 1, k - 1) ↠ T(m, n, k) = Ω(2^{min(n, k)}).

Upvotes: 4

hk6279

Reputation: 1879

Suppose m is the length of prefix, and n is the length of remaining. Then the complexity is given by

T(m, n, k) = 1 + n + 1 + T(m + 1, n - 1, k - 1) + T(m, n - 1, k)

Obviously, the function stop when n=0 or k=0. So,

T(r, n, 0) = 1 + r

T(m, 0, k) = 1 + 1 + 1 = 3

Reform equation 1, we got

T(m, n, k) - T(m, n - 1, k) = 2 + n + T(m + 1, n - 1, k - 1)

Replace n by n-1 in equation 1

T(m, n - 1, k) - T(m, n - 2, k) = 2 + (n - 1) + T(m + 1, n - 2, k - 1)

... continue ...

T(m, 1, k) - T(m, 0, k) = 2 + (1) + T(m + 1, 0, k - 1)

Sum them up

T(m, n, k) - T(m, 0, k) = 2(n) + (n-1)(n)/2 + {Summation of a from 0 to n - 1 on T(m + 1, a, k - 1)}

Reform

T(m, n, k) = n²/2 + 3n/2 +3 + {Summation of a from 0 to n - 1 on T(m + 1, a, k - 1)}

I guess we can get the answer by solving the Summation by using the last equation and the leading factor of the equation would be something like n^k+1

Upvotes: 0