Reputation: 8424

How slice an array while avoiding duplicate values in each slice?

Suppose I have this array:

a = [1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11]

a.each_slice(2).to_a will generate pairs, but these pairs will contain non-unique values like [3,3]. So I guess I'm looking for some sort of unique_each_slice method.

What I want is to be able to keep shuffling this array until I get to a point where I have unique pairs of 2 (doesn't have to be 2, can be anything), like this (using 2 an example):

[3, 1, 3, 7, 6, 3, 4, 5, 8, 3, 9, 3, 2, 3, 6, 3, 3, 11, 10, 3]

If you do each_slice(2) on this array, you'll get unique pairs:

[[3, 1], [3, 7], [6, 3], [4, 5], [8, 3], [9, 3], [2, 3], [6, 3], [3, 11], [10, 3]]

compared to the original where you have:

[[1, 2], [3, 3], [3, 3], [3, 3], [3, 3], [3, 4], [5, 6], [6, 7], [8, 9], [10, 11]]

with non-unique pairs in each, like [3,3]

Another example, suppose I have:

a = [1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11,12,13,14,15,16,17]

Now, supposing, there's some function a.unique_slices_of(3), I'd get:

[[4, 16, 3], [1, 9, 3], [3, 6, 17], [3, 6, 10], [15, 3, 2], [3, 8, 12], [11, 3, 14], [7, 13, 3], [3, 5]]

By "unique slice" I mean a slice where the same number doesn't repeat twice: [1,2,3] is a unique slice, [3,1,3] is not.

So far I've come with the following method which seems to take several iterations before it gets things right:

class Array
  def unique_slices_of!(slices)
    loop do
      unique = true
      self.each_slice(slices) do |slice|
        if slice != slice.uniq
          self.shuffle!
          unique = false # so we know whether to loop again
          break
        end
      end
      break if unique # if unique didn't change, that means all slices were equal
      if unique == false then unique == true end # reset and start again
    end
    self 
  end
end

The major problem with my code is that a) I don't think I'm using some idiomatic Ruby method which can shorten this process in half or more. b) The possibility for an infinite loop if the array simply cannot contain unique slices. I'd probably need to use some theory of combinations here, but I'm not sure how.

Upvotes: 3

Answers (4)

Todd A. Jacobs

Reputation: 84403

Sampling Unique Combinations

If you're looking for something a little more idiomatic, and if efficiency of the algorithm isn't your primary concern, you might try the following:

a = [1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11]
a.combination(2).reject { |pair| pair[0] == pair[1] }.sample(a.size / 2)

The main downside to this approach is speed when a is large, because Array#combination will generate all possible combinations before you winnow down the results with Array#reject and Array#sample. However, for arrays of modest size it certainly seems fast enough.

Evaluating the Performance of the Solution

Casual testing suggests this is more than fast enough for arrays of modest size. Consider:

require 'benchmark'

a = [1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11]

Benchmark.measure do
  a.combination(2).reject { |pair| pair[0] == pair[1] }.sample(a.size / 2)
end.to_s
#=> "  0.000000   0.000000   0.000000 (  0.000052)\n"

Even at 100,000 iterations, it still took only 3.650299 seconds on my system. That seems fast enough for practical use given your posted corpus, but your mileage may vary.

Allowing Comparisons of Arbitrary Sub-Array Sizes

Comparing Members with Count

In comments, the OP asked if this could be generalized to winnow sub-arrays with 2, 3, or 4 elements each. Yes, with a little refactoring, although the performance degrades as the number of elements in the combination increases. Consider:

array = [1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 5, 6, 6, 7, 8, 9, 10, 11]
element_size = 4 

array.combination(element_size).
  reject { |element| element.map { |member| element.count(member) > 1 }.any? }.
      sample(array.size / element_size)

This uses the desired element_size to determine the number of samples to take dynamically. This has the side-benefit of dropping any partially-filled arrays, doing away with the "dangling" elements you'd get with #each_slice.

The workhorse here is still the reject method, which now iterates over each member of each sub-array using #count, and rejects elements that have #any? members that appear more than once in that sub-array. Even with better variable names, it's a little more difficult to follow than when we have a fixed element size, but it's certainly more flexible.

A More Readable (and Slightly Faster) Comparison

With a hat tip to @pguardiario (see this related answer), you can even shorten this up a bit more and make it more readable by selecting only sub-arrays where all the array members are #uniq. For example:

array.combination(element_size).
  select { |subarray| subarray == subarray.uniq }.
    sample(array.size / element_size)

Upvotes: 2

Wayne Conrad

Reputation: 108099

A non-random way to do it

The idea here is to put distinct values into bins. Then, while there are any bins remaining:

Order the bins by their size, largest bins first
Make a slice by take a number for each of the first max_slice_size bins
Remove empty bins

Because each value within a slice is taken from a different bin, it is guaranteed that the slice will contain distinct values.

The code:

def slices_without_repeats(a, max_slice_size)
  slices = []
  bins = a.group_by { |e| e }.values
  until bins.empty?
    bins = bins.sort_by(&:size).reverse
    slice_size = [max_slice_size, bins.size].min
    slice = slice_size.times.map do |i|
      bins[i].pop
    end
    slices << slice
    bins.reject!(&:empty?)
    if slice.size < max_slice_size && !bins.empty?
      raise ArgumentError, "An element repeats too much"
    end
  end
  slices
end

This algorithm uses no explicit randomness. It does use Ruby's quicksort, which is non-stable, and could potentially exploit randomness (as when selecting pivot points).

In use:

a = [1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11]
p slices_without_repeats(a, 2)
# [[3, 6], [3, 9], [3, 7], [3, 2], [3, 6],
#  [3, 10], [3, 11], [3, 4], [1, 8], [3, 5]]

It detects when it cannot be done:

p slices_without_repeats(a, 3)
# An element repeats too much (ArgumentError)

And it handles the case where the last slice is not full:

p slices_without_repeats([1, 2, 3, 3, 4, 4, 4], 3)
# [[4, 3, 2], [4, 3, 1], [4]]

Upvotes: 0

pguardiario

Reputation: 55002

a = [1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11]

You can test if slices are "unique" by:

a.each_slice(2).all?{|x| x == x.uniq}

So now you just shuffle until you get what you want:

a.shuffle! until a.each_slice(2).all?{|x| x == x.uniq}

The easiest way to avoid an infinite loop is with timeout:

require 'timeout'
# raise an error if it takes more than 1 second
timeout(1){ a.shuffle! until a.each_slice(3).all?{|x| x == x.uniq} }

Upvotes: 2

rohit89

Reputation: 5773

I have a solution that seems to work. The basic idea is to distribute the elements with maximum counts into as many slices as possible. Add a few shuffle to make it seem random.

class Array
  def unique_slices_of(slice_length)
    buf = []
    arr = []
    hash = Hash.new 0
    self.each {|i| hash[i] += 1}
    sorted = hash.sort_by {|k, v| v}.reverse
    # sorted[][0] holds the element and sorted[][1] holds the count
    return nil if sorted[0][1] > ((self.length * 1.0) / slice_length).ceil
    index = 0
    until sorted.length.zero?
      # Add element to buf and decrement count
      # if count == 0, remove the entry from sorted
      buf << sorted[index][0]
      sorted[index][1] -= 1
      if sorted[index][1] == 0
        sorted.delete_at index
        break if sorted.length == 0
        index -= 1
      end
      index = (index + 1) % sorted.length
      if buf.length == slice_length
        arr << buf.shuffle
        buf.clear
        index = 0
      end
    end
    arr << buf.shuffle if buf.length > 0
    arr.shuffle
  end
end

Output:

[3, 1, 3, 7, 6, 3, 4, 5, 8, 3, 9, 3, 2, 3, 6, 3, 3, 11, 10, 3].unique_slices_of(2)
#=> [[8, 3], [3, 6], [3, 5], [1, 10], [3, 4], [3, 6], [7, 3], [9, 3], [3, 11], [3, 2]]

[1,2,3,3,3,3,3,3,3,3,3,4,5,6,6,7,8,9,10,11,12,13,14,15,16,17].unique_slices_of(3)
#=> [[3, 9, 6], [6, 14, 3], [3, 2, 17], [7, 3, 16], [3, 11, 10], [15, 3, 8], [4, 3, 5], [3, 13, 12], [1, 3]]