Reputation: 339
From the Fuss-Catalan series C{4}_n (see the Online Encyclopedia for Integer Sequences OEIS A002293), I'd like to calculate the nth term using memoization. The code I cooked up below works, but takes about 43s when n=200 on my laptop. Is there a way to speed this up further?
numterms = 20
C4 = Array{BigInt}(numterms+1) # memoization dictionary
fill!(C4,-1) # -1 implies not yet computed
C4[1] = 1 # Base case for n=0, C[n+1] provides nth term
function catalan4(n,C)
C[n+1] == -1 || return C[n+1]
sum1 = convert(BigInt,0)
for i in 1:n
sum2 = convert(BigInt,0)
for j in 1:(n-i+1)
sum3 = convert(BigInt,0)
for k in 1:(n-i-j+2)
sum3+= catalan4(k-1,C)*catalan4(n-i-j-k+2,C)
end
sum2 += catalan4(j-1,C)*sum3
end
sum1 += catalan4(i-1,C)*sum2
end
C[n+1] = sum1
return sum1
end
for i in 1:numterms
println(i,"\t",catalan4(i,C4))
end
This provides as expected:
1 1
2 4
3 22
4 140
5 969
6 7084
7 53820
8 420732
9 3362260
10 27343888
11 225568798
12 1882933364
13 15875338990
14 134993766600
15 1156393243320
16 9969937491420
17 86445222719724
18 753310723010608
19 6594154339031800
20 57956002331347120
Thanks!
Upvotes: 3
Views: 261
Reputation: 339
So, after Stefan's answer above - I tried a couple of things. To see if the cuplrit was really Julia's BigInt, I tried the GMP mpz_t integers in a C version of the above code. Indeed, it was about 10 times faster for n=200. However, I sometimes need n=4096, and GMP and C wasn't really going to help me cut down my compute time. I then noticed that the inner loops were also being repeatedly re-calculated. So I memoized both the inner sums as follows, and now the code gives me for n=200, evaluation times of 0.06 seconds (down from 43s) with Julia and BigInts!
numterms = 200
C4 = Array{BigInt}(numterms+1) # memoization dictionary
fill!(C4,-1) # -1 implies not yet computed
C4[1] = 1 # Base case for n=0, C[n+1] provides nth term
# memoize partial sums as well
sum2store = similar(C4)
sum3store = similar(C4)
fill!(sum2store, -1)
fill!(sum3store, -1)
function catalan4(n,C)
C[n+1] == -1 || return C[n+1]
sum1 = convert(BigInt,0)
for i in 1:n
if sum2store[n-i+1] == -1
sum2 = convert(BigInt,0)
for j in 1:(n-i+1)
if sum3store[n-i-j+2] == -1
sum3 = convert(BigInt,0)
for k in 1:(n-i-j+2)
sum3+= catalan4(k-1,C)*catalan4(n-i-j-k+2,C)
end
sum3store[n-i-j+2] = sum3
end
sum2 += catalan4(j-1,C)*sum3store[n-i-j+2]
end
sum2store[n-i+1] = sum2
end
sum1 += catalan4(i-1,C)*sum2store[n-i+1]
end
C[n+1] = sum1
return sum1
end
Now I don't need Julia's BigInt to be fast - I'm just happy I can use it.
Upvotes: 3
Reputation: 33259
I suspect that bad BigInt
performance is the culprit here. You might want to try Nemo, which uses the Flint library for arbitrary precision integers, which I understand is considerably more efficient. If you want to stay in standard Julia (Nemo is Julia-based, but differs from Julia in some language semantics, afaik – it's a computer algebra system) and your numbers are limited to being smaller than 2^127, then you could try using Int128
instead – these can be stack-allocated and kept in registers, unlike BigInts
, which must be heap-allocated and which LLVM doesn't know how to reason about (it can't transform creation of new BigInts
into mutation of existing ones). It might not be too hard to create custom Int256
and Int512
types that use two/four Int128
values to do their arithmetic, especially if you only need to support addition and multiplication.
Upvotes: 5