Reputation:
Im trying to construct and compare, the fastest possible way, two 01 random vectors of the same length using Julia, each vector with the same number of zeros and ones.
This is all for a MonteCarlo simulation of the following probabilistic question
We have two independent urns, each one with n white balls and n black balls. Then we take a pair of balls, one of each urn, each time up to empty the urns. What is the probability that each pair have the same color?
What I did is the following:
using Random
# Auxiliar function that compare the parity, element by element, of two
# random vectors of length 2n
function comp(n::Int64)
sum((shuffle!(Vector(1:2*n)) .+ shuffle!(Vector(1:2*n))).%2)
end
The above generate two random permutations of the vector from 1 to 2n, add element by element, apply modulo 2 to each elemnt and after sum all the values of the remaining vector. Then Im using above the parity of each number to model it color: odd black and white even.
If the final sum is zero then the two random vectors had the same colors, element by element. A different result says that the two vectors doesnt had paired colors.
Then I setup the following function, that it is just the MonteCarlo simulation of the desired probability:
# Here m is an optional argument that control the amount of random
# experiments in the simulation
function sim(n::Int64,m::Int64=24)
# A counter for the valid cases
x = 0
for i in 1:2^m
# A random pair of vectors is a valid case if they have the
# the same parity element by element so
if comp(n) == 0
x += 1
end
end
# The estimated value
x/2^m
end
Now I want to know if there is a faster way to compare such vectors. I tried the following alternative construction and comparison for the random vectors
shuffle!( repeat([0,1],n)) == shuffle!( repeat([0,1],n))
Then I changed accordingly the code to
comp(n)
With these changes the code runs slightly slower, what I tested with the function @time
. Other changes that I did was changing the for
statement for a while
statement, but the computation time remain the same.
Because Im not programmer (indeed just yesterday I learn something of the Julia language, and installed the Juno front-end) then probably will be a faster way to make the same computations. Some tip will be appreciated because the effectiveness of a MonteCarlo simulation depends on the number of random experiments, so the faster the computation the larger values we can test.
Upvotes: 2
Views: 569
Reputation: 2004
To get something cleaner, you could generate directly vectors of 0/1 values, and then just let Julia check for vector equality, e.g.
function rndvec(n::Int64)
shuffle!(vcat(zeros(Bool,n),ones(Bool,n)))
end
function sim0(n::Int64, m::Int64=24)
sum(rndvec(n) == rndvec(n) for i in 1:2^m) / 2^m
end
Avoiding allocation makes the code faster, as explained by Bogumił Kamiński (and letting Julia make the comparison is faster than his code).
function sim1(n::Int64, m::Int64=24)
vref = vcat(zeros(Bool,n),ones(Bool,n))
vshuffled = vref[:]
sum(shuffle!(vshuffled) == vref for i in 1:2^m) / 2^m
end
To go even faster use lazy evaluation and fast exit: if the first element is different, you don't even need to generate the rest of the vectors. This would make the code much trickier though.
I find it's a bit not in the spirit of the question, but you could also do some more math.
There is binomial(2*n, n)
possible vectors generated and you could therefore just compute
function sim2(n::Int64, m::Int64=24)
nvec = binomial(2*n, n)
sum(rand(1:nvec) == 1 for i in 1:2^m) / 2^m
end
Here are some timings I obtain:
@time show(("sim0", sim0(6, 21)))
@time show(("sim1", sim1(6, 21)))
@time show(("sim2", sim2(6, 21)))
@time test(("test", test(6, 2^21)))
("sim0", 0.0010724067687988281) 4.112159 seconds (12.68 M allocations: 1.131 GiB, 11.47% gc time)
("sim1", 0.0010781288146972656) 0.916075 seconds (19.87 k allocations: 1.092 MiB)
("sim2", 0.0010628700256347656) 0.249432 seconds (23.12 k allocations: 1.258 MiB)
("test", 0.0010166168212890625) 1.180781 seconds (2.14 M allocations: 98.634 MiB, 2.22% gc time)
Upvotes: 1
Reputation: 69819
The key cost in this problem is shuffle!
therefore in order to maximize the simulation speed you can use (I add it as an answer as it is too long for a comment):
function test(n,m)
ref = [isodd(i) for i in 1:2n]
sum(all(view(shuffle!(ref), 1:n)) for i in 1:m) / m
end
What are the differences from the code proposed in the other answer:
shuffle!
both vectors; it is enough to shuffle!
one of them, as the result of the comparison is invariant to any identical permutation of both vectors after independently shuffling them; therefore we can assume that one vector is after random permutation reshuffled to be ordered so that it has trues
in the first n
entries and falses
in the last n
entriesshuffle!
in-place (i.e. ref
vector is allocated only once)all
function on the fist half of the vector; this way the check is stopped as I hit first false
; if I hit all true
in the first n
entries I do not have to check the last n
entries as I know they are all false
so I do not have to check themUpvotes: 2