branden_burger
branden_burger

Reputation: 339

Julia memoized code for Fuss-Catalan numbers

From the Fuss-Catalan series C{4}_n (see the Online Encyclopedia for Integer Sequences OEIS A002293), I'd like to calculate the nth term using memoization. The code I cooked up below works, but takes about 43s when n=200 on my laptop. Is there a way to speed this up further?

numterms = 20
C4 = Array{BigInt}(numterms+1) # memoization dictionary
fill!(C4,-1) # -1 implies not yet computed
C4[1] = 1 # Base case for n=0, C[n+1] provides nth term

function catalan4(n,C)
    C[n+1] == -1 || return C[n+1]
    sum1 = convert(BigInt,0)
    for i in 1:n
       sum2 = convert(BigInt,0)
       for j in 1:(n-i+1)
            sum3 = convert(BigInt,0)
            for k in 1:(n-i-j+2)
                sum3+= catalan4(k-1,C)*catalan4(n-i-j-k+2,C)
            end
            sum2 +=  catalan4(j-1,C)*sum3          
       end
       sum1 +=  catalan4(i-1,C)*sum2 
    end
    C[n+1] = sum1    
    return sum1
end 

for i in 1:numterms
    println(i,"\t",catalan4(i,C4))
end    

This provides as expected:

1   1
2   4
3   22
4   140
5   969
6   7084
7   53820
8   420732
9   3362260
10  27343888
11  225568798
12  1882933364
13  15875338990
14  134993766600
15  1156393243320
16  9969937491420
17  86445222719724
18  753310723010608
19  6594154339031800
20  57956002331347120

Thanks!

Upvotes: 3

Views: 261

Answers (2)

branden_burger
branden_burger

Reputation: 339

So, after Stefan's answer above - I tried a couple of things. To see if the cuplrit was really Julia's BigInt, I tried the GMP mpz_t integers in a C version of the above code. Indeed, it was about 10 times faster for n=200. However, I sometimes need n=4096, and GMP and C wasn't really going to help me cut down my compute time. I then noticed that the inner loops were also being repeatedly re-calculated. So I memoized both the inner sums as follows, and now the code gives me for n=200, evaluation times of 0.06 seconds (down from 43s) with Julia and BigInts!

numterms = 200
C4 = Array{BigInt}(numterms+1) # memoization dictionary
fill!(C4,-1) # -1 implies not yet computed
C4[1] = 1 # Base case for n=0, C[n+1] provides nth term

# memoize partial sums as well
sum2store = similar(C4)
sum3store = similar(C4)
fill!(sum2store, -1)
fill!(sum3store, -1)

function catalan4(n,C)
    C[n+1] == -1 || return C[n+1]
    sum1 = convert(BigInt,0)
    for i in 1:n
        if sum2store[n-i+1] == -1
            sum2 = convert(BigInt,0)
            for j in 1:(n-i+1)
                if sum3store[n-i-j+2] == -1
                    sum3 = convert(BigInt,0)
                    for k in 1:(n-i-j+2)
                        sum3+= catalan4(k-1,C)*catalan4(n-i-j-k+2,C)
                    end
                    sum3store[n-i-j+2] = sum3
                end
                sum2 +=  catalan4(j-1,C)*sum3store[n-i-j+2]
            end
            sum2store[n-i+1] = sum2
        end
        sum1 +=  catalan4(i-1,C)*sum2store[n-i+1]
    end
    C[n+1] = sum1
    return sum1
end

Now I don't need Julia's BigInt to be fast - I'm just happy I can use it.

Upvotes: 3

StefanKarpinski
StefanKarpinski

Reputation: 33259

I suspect that bad BigInt performance is the culprit here. You might want to try Nemo, which uses the Flint library for arbitrary precision integers, which I understand is considerably more efficient. If you want to stay in standard Julia (Nemo is Julia-based, but differs from Julia in some language semantics, afaik – it's a computer algebra system) and your numbers are limited to being smaller than 2^127, then you could try using Int128 instead – these can be stack-allocated and kept in registers, unlike BigInts, which must be heap-allocated and which LLVM doesn't know how to reason about (it can't transform creation of new BigInts into mutation of existing ones). It might not be too hard to create custom Int256 and Int512 types that use two/four Int128 values to do their arithmetic, especially if you only need to support addition and multiplication.

Upvotes: 5

Related Questions