Remi.b
Remi.b

Reputation: 18219

Create a `Bool` array by randomly selecting values from two other `Bool` arrays

Here are two arrays of true/false

a = [true, true, false, true, false]
b = [true, false, false, false, true]

I am willing to make a new array c where each element of c comes either from a or from b with equal probability

Here are the two solutions I found:

c=trues(5)
for i = 1:5
  c[i] = rand() > 0.5 ? a[i] : b[i]
end

and

c=trues(5)
ab = hcat(a,b)
which = sample([1,2], 5)
for i = 1:5
  c[i] = ab[i,which[i]]
end

Is there a better (faster) solution?

Upvotes: 2

Views: 121

Answers (2)

Mike Satteson
Mike Satteson

Reputation: 736

This is slight variant on jch's answer, but it may be faster to use bit manipulation rather than a comprehension to select the elements of a and b according to the "mask" r:

julia> r = bitrand(5)
5-element BitArray{1}:
  true
  true
 false
 false
 false

julia> r&a | ~r&b
5-element BitArray{1}:
  true
  true
  false
  false
  true

Note that I've corrected bit manipulation answer to this question

In my original answer to this question I said that the way to select a or b according to the mask r is a&r|b. However bothered by the lack of symmetry in this formula, I took a more careful look at this problem, generating a full truth table and correcting the formula to r&a | ~r&b:

julia> r = [trues(4), falses(4)]
julia> a = [trues(2), falses(2)]
julia> a = [a, a]
julia> b = [true, false]
julia> b = [b, b, b, b]
julia> c = r&a | ~r&b
julia> nocigar = a&r|b
julia> q = [r[i] ? a[i] : b[i] for i in 1:length(a)]

julia> all([c[i] == q[i] for i in 1:length(a)])
true

julia> all([nocigar[i] == q[i] for i in 1:length(a)])
false

The original formula goes wrong for one of the 8 entries in the truth table, when the mask is true but a is false, or with a true b incorrectly results in true. The fix is to and the mask with a, the mask's complement with b, oring the result. Here is the truth table:

julia> hcat(a, b, r, c, nocigar)
8x5 Array{Bool,2}:
  true   true   true   true   true
  true  false   true   true   true
 false   true   true  false   true
 false  false   true  false  false
  true   true  false   true   true
  true  false  false  false  false
 false   true  false   true   true
 false  false  false  false  false

And now for a bit of profiling

My intuition was that the bit manipulation solution to this problem would be significantly faster than using a comprehension with the ? operator. Timing these operations shows that bit manipulation is orders of magnitude faster and much less memory intensive for large arrays. Here is what I get when a, b and r have a length of 10^6:

julia> a = bitrand(10^6)
...

julia> @time c = r&a | ~r&b
elapsed time: 0.000607186 seconds (500496 bytes allocated)

julia> @time c = [r[i] ? a[i] : b[i] for i in 1:length(a)]
elapsed time: 0.446756657 seconds (167967416 bytes allocated, 16.91% gc time)

Running these assignments multiple times produces consistent results. For shorter arrays, length 100, there is little difference in execution time, though the comprehension uses more memory:

julia> a = bitrand(100)
...

julia> @time c = r&a | ~r&b
elapsed time: 1.3979e-5 seconds (464 bytes allocated)

julia> @time c = [r[i] ? a[i] : b[i] for i in 1:length(a)]
elapsed time: 5.326e-5 seconds (10520 bytes allocated)

So if you're working with small arrays, either technique is fine, but bit manipulation is way more efficient for very large arrays.

Upvotes: 4

jch
jch

Reputation: 5651

I would say something like:

r = randbit(5)
c = [ r[i] ? a[i] : b[i] for i = 1:5 ]

The first line efficiently generates an array r of random booleans (actually a bitarray, but that's an implementation detail). The second line selects elements from a and b according to the values in r, using an array comprehension, which is equivalent to your for loop but slightly more legible.

Upvotes: 2

Related Questions