user4901852
user4901852

Reputation:

Faster vector comparison in Julia

Im trying to construct and compare, the fastest possible way, two 01 random vectors of the same length using Julia, each vector with the same number of zeros and ones.

This is all for a MonteCarlo simulation of the following probabilistic question

We have two independent urns, each one with n white balls and n black balls. Then we take a pair of balls, one of each urn, each time up to empty the urns. What is the probability that each pair have the same color?

What I did is the following:

using Random
# Auxiliar function that compare the parity, element by element, of two 
# random vectors of length 2n
function comp(n::Int64)
  sum((shuffle!(Vector(1:2*n)) .+ shuffle!(Vector(1:2*n))).%2)
end

The above generate two random permutations of the vector from 1 to 2n, add element by element, apply modulo 2 to each elemnt and after sum all the values of the remaining vector. Then Im using above the parity of each number to model it color: odd black and white even.

If the final sum is zero then the two random vectors had the same colors, element by element. A different result says that the two vectors doesnt had paired colors.

Then I setup the following function, that it is just the MonteCarlo simulation of the desired probability:

  # Here m is an optional argument that control the amount of random
  # experiments in the simulation
  function sim(n::Int64,m::Int64=24)
  # A counter for the valid cases
  x = 0
  for i in 1:2^m
    # A random pair of vectors is a valid case if they have the
    # the same parity element by element so
   if comp(n) == 0
    x += 1
   end
  end
  # The estimated value
   x/2^m
  end

Now I want to know if there is a faster way to compare such vectors. I tried the following alternative construction and comparison for the random vectors

shuffle!( repeat([0,1],n)) == shuffle!( repeat([0,1],n))

Then I changed accordingly the code to

comp(n)

With these changes the code runs slightly slower, what I tested with the function @time. Other changes that I did was changing the forstatement for a whilestatement, but the computation time remain the same.

Because Im not programmer (indeed just yesterday I learn something of the Julia language, and installed the Juno front-end) then probably will be a faster way to make the same computations. Some tip will be appreciated because the effectiveness of a MonteCarlo simulation depends on the number of random experiments, so the faster the computation the larger values we can test.

Upvotes: 2

Views: 569

Answers (2)

One Lyner
One Lyner

Reputation: 2004

To get something cleaner, you could generate directly vectors of 0/1 values, and then just let Julia check for vector equality, e.g.

function rndvec(n::Int64)
 shuffle!(vcat(zeros(Bool,n),ones(Bool,n)))
end

function sim0(n::Int64, m::Int64=24)
  sum(rndvec(n) == rndvec(n) for i in 1:2^m) / 2^m
end

Avoiding allocation makes the code faster, as explained by Bogumił Kamiński (and letting Julia make the comparison is faster than his code).

function sim1(n::Int64, m::Int64=24)
  vref = vcat(zeros(Bool,n),ones(Bool,n))
  vshuffled = vref[:]
  sum(shuffle!(vshuffled) == vref for i in 1:2^m) / 2^m
end

To go even faster use lazy evaluation and fast exit: if the first element is different, you don't even need to generate the rest of the vectors. This would make the code much trickier though.

I find it's a bit not in the spirit of the question, but you could also do some more math. There is binomial(2*n, n) possible vectors generated and you could therefore just compute

function sim2(n::Int64, m::Int64=24)
   nvec = binomial(2*n, n)
   sum(rand(1:nvec) == 1 for i in 1:2^m) / 2^m
end

Here are some timings I obtain:

@time show(("sim0", sim0(6, 21)))
@time show(("sim1", sim1(6, 21)))
@time show(("sim2", sim2(6, 21)))
@time test(("test", test(6, 2^21)))

("sim0", 0.0010724067687988281)  4.112159 seconds (12.68 M allocations: 1.131 GiB, 11.47% gc time)
("sim1", 0.0010781288146972656)  0.916075 seconds (19.87 k allocations: 1.092 MiB)
("sim2", 0.0010628700256347656)  0.249432 seconds (23.12 k allocations: 1.258 MiB)
("test", 0.0010166168212890625)  1.180781 seconds (2.14 M allocations: 98.634 MiB, 2.22% gc time)

Upvotes: 1

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69819

The key cost in this problem is shuffle! therefore in order to maximize the simulation speed you can use (I add it as an answer as it is too long for a comment):

function test(n,m)
  ref = [isodd(i) for i in 1:2n]
  sum(all(view(shuffle!(ref), 1:n)) for i in 1:m) / m
end

What are the differences from the code proposed in the other answer:

  1. You do not have to shuffle! both vectors; it is enough to shuffle! one of them, as the result of the comparison is invariant to any identical permutation of both vectors after independently shuffling them; therefore we can assume that one vector is after random permutation reshuffled to be ordered so that it has trues in the first n entries and falses in the last n entries
  2. I do shuffle! in-place (i.e. ref vector is allocated only once)
  3. I use all function on the fist half of the vector; this way the check is stopped as I hit first false; if I hit all true in the first n entries I do not have to check the last n entries as I know they are all false so I do not have to check them

Upvotes: 2

Related Questions