GMDev
GMDev

Reputation: 93

Column Based Backwards Substitution Flop count

I have a function I'm trying to do a flop count on , but I keep getting 2n instead of n^2. I know its supposed to be n^2 based on the fact it's still a nxn triangular system that is just being solved in column first order. I'm new to linear algebra so please forgive me. I'll include the function , as well as all my work shown the best I can. Column BS Function

function col_bs(U, b)

    n = length(b)
    x = copy(b)

    for j = n:-1:2
        if U[j,j] == 0
            error("Error: Matrix U is singular.")
        end
        x[j] = x[j]/U[j,j] 

        for i=1:j-1
            x[i] = x[i] - x[j] * U[i , j ]
        end
    end

    x[1] = x[1]/U[1,1]


    return x
end
  1. To start 2 flops for the addition and multiplication ð‘Ĩ[𝑖]−ð‘Ĩ[𝑗]∗𝑈[𝑖,𝑗]

The 𝑖 loop does: ∑𝑖=𝑗−112

  1. 1 flop for the division ð‘Ĩ[𝑗]/=𝑈[𝑗,𝑗]

  2. Inside the for 𝑗 loop in total does: 1+∑𝑖=𝑗−112

  3. The 𝑗 loop itself does: ∑𝑗=2𝑛(1+∑𝑖=𝑗−112))

  4. Then one final flop for ð‘Ĩ[1]=ð‘Ĩ[1]/𝑈[1,1].

  5. Finally we have 1+(∑𝑗=2𝑛(1+∑𝑖=𝑗−112))).

Which we can now break down.

If we distribute and simplify 1+(∑𝑗=2𝑛+∑𝑗=2𝑛∑𝑖=𝑗−112).

We can look at only the significant variables and ignore constants,

1+(𝑛+𝑛(1))(𝑛+𝑛(1))𝑛+𝑛2𝑛

Which then means that if we ignore constants the highest possibility of flops for this formula would be 𝑛 ( which may be a hint to whats wrong with my function since it should be 𝑛2 just like the rest of our triangular systems I believe) Proof

Proof

Upvotes: 4

Views: 217

Answers (1)

bardo
bardo

Reputation: 428

Since the code has two nested for-loops, each one proportional to n, a quadratic runtime can be expected.

using LinearAlgebra, BenchmarkTools

function col_bs(U, b)
    n = length(b)
    x = copy(b)
    for j = n:-1:2                          # n*(
        if U[j,j] == 0
            error("Error: Matrix U is singular.")
        end
        x[j] = x[j]/U[j,j]                  #    1 +
        
        for i=1:j-1                         #    n*(
            x[i] = x[i] - x[j] * U[i , j ]  #      2
        end                                 #    )
    end                                     # ) +
    x[1] = x[1]/U[1,1]                      # 1
    return x                                # = n*(1+2*n) + 1 = O(n^2)
end

for n in 2 .^[1:10...]
    local U = triu(randn(n,n))
    local b = randn(n,1)
    @btime col_bs($U, $b)
end

nicely approaches the expected factor-of-4 increase in runtime:

  61.366 ns (1 allocation: 80 bytes)
  90.900 ns (1 allocation: 96 bytes)
  147.557 ns (1 allocation: 128 bytes)
  280.000 ns (1 allocation: 192 bytes)
  1.100 Ξs (1 allocation: 336 bytes)
  3.900 Ξs (1 allocation: 576 bytes)
  15.600 Ξs (1 allocation: 1.06 KiB)
  56.800 Ξs (1 allocation: 2.12 KiB)
  219.200 Ξs (1 allocation: 4.12 KiB)
  957.200 Ξs (1 allocation: 8.12 KiB)

Upvotes: 2

Related Questions