Reputation: 8424
Suppose I have this array:
a = [1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11]
a.each_slice(2).to_a will generate pairs, but these pairs will contain non-unique values like [3,3]. So I guess I'm looking for some sort of unique_each_slice method.
What I want is to be able to keep shuffling this array until I get to a point where I have unique pairs of 2 (doesn't have to be 2, can be anything), like this (using 2 an example):
[3, 1, 3, 7, 6, 3, 4, 5, 8, 3, 9, 3, 2, 3, 6, 3, 3, 11, 10, 3]
If you do each_slice(2) on this array, you'll get unique pairs:
[[3, 1], [3, 7], [6, 3], [4, 5], [8, 3], [9, 3], [2, 3], [6, 3], [3, 11], [10, 3]]
compared to the original where you have:
[[1, 2], [3, 3], [3, 3], [3, 3], [3, 3], [3, 4], [5, 6], [6, 7], [8, 9], [10, 11]]
with non-unique pairs in each, like [3,3]
Another example, suppose I have:
a = [1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11,12,13,14,15,16,17]
Now, supposing, there's some function a.unique_slices_of(3), I'd get:
[[4, 16, 3], [1, 9, 3], [3, 6, 17], [3, 6, 10], [15, 3, 2], [3, 8, 12], [11, 3, 14], [7, 13, 3], [3, 5]]
By "unique slice" I mean a slice where the same number doesn't repeat twice: [1,2,3] is a unique slice, [3,1,3] is not.
So far I've come with the following method which seems to take several iterations before it gets things right:
class Array
def unique_slices_of!(slices)
loop do
unique = true
self.each_slice(slices) do |slice|
if slice != slice.uniq
self.shuffle!
unique = false # so we know whether to loop again
break
end
end
break if unique # if unique didn't change, that means all slices were equal
if unique == false then unique == true end # reset and start again
end
self
end
end
The major problem with my code is that a) I don't think I'm using some idiomatic Ruby method which can shorten this process in half or more. b) The possibility for an infinite loop if the array simply cannot contain unique slices. I'd probably need to use some theory of combinations here, but I'm not sure how.
Upvotes: 3
Views: 1007
Reputation: 84403
If you're looking for something a little more idiomatic, and if efficiency of the algorithm isn't your primary concern, you might try the following:
a = [1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11]
a.combination(2).reject { |pair| pair[0] == pair[1] }.sample(a.size / 2)
The main downside to this approach is speed when a is large, because Array#combination will generate all possible combinations before you winnow down the results with Array#reject and Array#sample. However, for arrays of modest size it certainly seems fast enough.
Casual testing suggests this is more than fast enough for arrays of modest size. Consider:
require 'benchmark'
a = [1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11]
Benchmark.measure do
a.combination(2).reject { |pair| pair[0] == pair[1] }.sample(a.size / 2)
end.to_s
#=> " 0.000000 0.000000 0.000000 ( 0.000052)\n"
Even at 100,000 iterations, it still took only 3.650299 seconds on my system. That seems fast enough for practical use given your posted corpus, but your mileage may vary.
In comments, the OP asked if this could be generalized to winnow sub-arrays with 2, 3, or 4 elements each. Yes, with a little refactoring, although the performance degrades as the number of elements in the combination increases. Consider:
array = [1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11]
element_size = 4
array.combination(element_size).
reject { |element| element.map { |member| element.count(member) > 1 }.any? }.
sample(array.size / element_size)
This uses the desired element_size to determine the number of samples to take dynamically. This has the side-benefit of dropping any partially-filled arrays, doing away with the "dangling" elements you'd get with #each_slice.
The workhorse here is still the reject method, which now iterates over each member of each sub-array using #count, and rejects elements that have #any? members that appear more than once in that sub-array. Even with better variable names, it's a little more difficult to follow than when we have a fixed element size, but it's certainly more flexible.
With a hat tip to @pguardiario (see this related answer), you can even shorten this up a bit more and make it more readable by selecting only sub-arrays where all the array members are #uniq. For example:
array.combination(element_size).
select { |subarray| subarray == subarray.uniq }.
sample(array.size / element_size)
Upvotes: 2
Reputation: 108099
The idea here is to put distinct values into bins. Then, while there are any bins remaining:
max_slice_size
binsBecause each value within a slice is taken from a different bin, it is guaranteed that the slice will contain distinct values.
The code:
def slices_without_repeats(a, max_slice_size)
slices = []
bins = a.group_by { |e| e }.values
until bins.empty?
bins = bins.sort_by(&:size).reverse
slice_size = [max_slice_size, bins.size].min
slice = slice_size.times.map do |i|
bins[i].pop
end
slices << slice
bins.reject!(&:empty?)
if slice.size < max_slice_size && !bins.empty?
raise ArgumentError, "An element repeats too much"
end
end
slices
end
This algorithm uses no explicit randomness. It does use Ruby's quicksort, which is non-stable, and could potentially exploit randomness (as when selecting pivot points).
In use:
a = [1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11]
p slices_without_repeats(a, 2)
# [[3, 6], [3, 9], [3, 7], [3, 2], [3, 6],
# [3, 10], [3, 11], [3, 4], [1, 8], [3, 5]]
It detects when it cannot be done:
p slices_without_repeats(a, 3)
# An element repeats too much (ArgumentError)
And it handles the case where the last slice is not full:
p slices_without_repeats([1, 2, 3, 3, 4, 4, 4], 3)
# [[4, 3, 2], [4, 3, 1], [4]]
Upvotes: 0
Reputation: 55002
a = [1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11]
You can test if slices are "unique" by:
a.each_slice(2).all?{|x| x == x.uniq}
So now you just shuffle until you get what you want:
a.shuffle! until a.each_slice(2).all?{|x| x == x.uniq}
The easiest way to avoid an infinite loop is with timeout
:
require 'timeout'
# raise an error if it takes more than 1 second
timeout(1){ a.shuffle! until a.each_slice(3).all?{|x| x == x.uniq} }
Upvotes: 2
Reputation: 5773
I have a solution that seems to work. The basic idea is to distribute the elements with maximum counts into as many slices as possible. Add a few shuffle
to make it seem random.
class Array
def unique_slices_of(slice_length)
buf = []
arr = []
hash = Hash.new 0
self.each {|i| hash[i] += 1}
sorted = hash.sort_by {|k, v| v}.reverse
# sorted[][0] holds the element and sorted[][1] holds the count
return nil if sorted[0][1] > ((self.length * 1.0) / slice_length).ceil
index = 0
until sorted.length.zero?
# Add element to buf and decrement count
# if count == 0, remove the entry from sorted
buf << sorted[index][0]
sorted[index][1] -= 1
if sorted[index][1] == 0
sorted.delete_at index
break if sorted.length == 0
index -= 1
end
index = (index + 1) % sorted.length
if buf.length == slice_length
arr << buf.shuffle
buf.clear
index = 0
end
end
arr << buf.shuffle if buf.length > 0
arr.shuffle
end
end
Output:
[3, 1, 3, 7, 6, 3, 4, 5, 8, 3, 9, 3, 2, 3, 6, 3, 3, 11, 10, 3].unique_slices_of(2)
#=> [[8, 3], [3, 6], [3, 5], [1, 10], [3, 4], [3, 6], [7, 3], [9, 3], [3, 11], [3, 2]]
[1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11,12,13,14,15,16,17].unique_slices_of(3)
#=> [[3, 9, 6], [6, 14, 3], [3, 2, 17], [7, 3, 16], [3, 11, 10], [15, 3, 8], [4, 3, 5], [3, 13, 12], [1, 3]]
Upvotes: 0