Analysis of the running time of concat

Question

This is an example from Richard Bird's Thinking Functionally with Haskell (p. 158). Could someone explain me the reasoning behind 1,2,3 and 4?

EDIT. I understand the first 3 equations formed from the first definition. For 1., why is a summation? How is it tie to T(++)(n, m) = Θ(n)? For the second definition, I understand the first 2 statements. The third one (2.), why k+n? For 3. and 4. I am completely lost

Consider first the following two definitions of concat:

concat xss= foldr (++) [] xss 
concat' xss = foldl (++) [] xss

The two definitions are equivalent provided xss is a finite list. Suppose xss is a list of length m of lists all of length n. Then the first definition gives

T(concat)(m, n) = T(foldr (++) [])(m, n),
T(foldr (++) [])(0, n) = Θ(1), 
T(foldr (++) [])(m+1, n) = T(++)(n, mn) + T(foldr (++) [])(m, n).

The estimate T(++)(n, mn) arises because a list of length n is concatenated with a list of length mn. Since T(++)(n, m) = Θ(n), we obtain

1. T(foldr (++) [])(m,n) = Σ_{k=0}^{m} Θ(n) = Θ(mn)

For the second definition of concat we have

T(concat')(m, n) = T(foldl (++))(0, m, n),  
T(foldl (++))(k, 0, n) = O(1), 
2. T(foldl (++))(k, m+1, n) = T(++)(k, n) + T(foldl (++))(k+n, m, n).

The additional argument k refers to the length of the accumulated list in the second argument of foldl. This time we obtain

3. T(foldl (++)) (k,m,n) = Σ_{j=0}{m-1} Θ(k+jn) = Θ(k+m^2n)

Hence

4. T(concat')(m, n) = Θ(m_2 n)

chi · Accepted Answer

For 1., why is a summation? How is it tie to T(++)(n, m) = Θ(n)?

The summation arises from the inductive definition. Consider this independent definition

g(0,n) = something
g(m+1,n) = f(m,n) + g(m,n)

By induction, we get

g(m,m)
= f(m-1,n) + g(m-1,n)
= f(m-1,n) + f(m-2,n) + g(m-2,n)
= f(m-1,n) + f(m-2,n) + f(m-3,n) + g(m-3,n)
= f(m-1,n) + f(m-2,n) + f(m-3,n) + ... + f(m-m,n) + g(m-m,n)
= f(m-1,n) + f(m-2,n) + f(m-3,n) + ... + f(m-m,n) + something

So, the result is a summation of f(n,x) where x varies, plus a final term something.

In the original definition, g(m,n) is T(foldr (++) [])(m, n), while f(m,n) = T(++)(n, mn). The final term does not matter, asymptotically.

For the second definition, I understand the first 2 statements. The third one (2.), why k+n?

When computing foldl (++) x ys, x is assumed to be a list having length k, and ys to be a list-of-lists having length m+1, whose elements have length n.

Now, the recursive foldl equation is:

foldl (++) x (y:ys) = foldl (++) (x++y) ys

The cost of the recursive call comprises (A) the cost of temp = x++y, which is T(++)(k, n), and (B) the cost of foldl (++) temp ys. For (B), the length of temp is now length x + length y = k+n, while ys has length m (n is unaffected).

So, we get

T(foldl (++))(k, m+1, n) 
= A           + B
= T(++)(k, n) + T(foldl (++))(k+n, m, n).

For 3. and 4. I am completely lost

In 3. we sum, as we did for 1., the term T(++)(k, n) with k varying. Note that k increases by n every time we recurse, so we get

T(++)(k, n) + T(++)(k+n, n) + T(++)(k+n+n, n) + ...
= Σ_{j=0}{m-1} T(++)(k+jn, n)
= Σ_{j=0}{m-1} Θ(k+jn) 
= Θ(k+m^2 n)

In 4. we choose k=0 since in the definition of concat' we initially pick x=[] whose length is 0. So, only Θ(m^2 n) survives.

Analysis of the running time of concat

Answers (1)

Related Questions