Reputation: 135

Remove duplicates in Ruby Array

I can very easily remove duplicates in an array by using .uniq, but how would I go about doing it without using the .uniq method?

Upvotes: 3

Answers (5)

Amit Sharma

Reputation: 3477

You can also try this, check following example.

a = [1, 1, 1, 2, 4, 3, 4, 3, 2, 5, 5, 6]

b = []

a.each{ |aa| b << aa unless b.include?(aa) }

# when you check b you will get following result.

[1, 2, 4, 3, 5, 6]

alternatively you can also try following

a = [1, 1, 1, 2, 4, 3, 4, 3, 2, 5, 5, 6]

b = a & a

# OR

b = a | a

# both will return following result

[1, 2, 4, 3, 5, 6]

Upvotes: 1

Cary Swoveland

Reputation: 110755

a = [1, 1, 1, 2, 4, 3, 4, 3, 2, 5, 5, 6]

class Array
  def my_uniq
    self | []
  end
end

a.my_uniq
  #=> [1, 2, 4, 3, 5, 6]

This uses the method Array#|: "Set Union — Returns a new array by joining ary with other_ary, excluding any duplicates and preserving the order from the original array."

Here is a benchmark for the various answers, as well as Array#uniq.

require 'fruity'
require 'set'

def doit(n, m)
  arr = n.times.to_a
  arr = m.times.map { arr.sample }
  compare do
    uniq     { arr.uniq } 
    Schwern  { uniq = []; arr.sort.each { |e| uniq.push(e) if e != uniq[-1]; uniq } }
    Sharma   {b = []; arr.each{ |aa| b << aa unless b.include?(aa) }; b }
    Mihael   { arr.to_set.to_a }
    sawa     { arr.group_by(&:itself).keys }
    Cary     { arr | [] }
  end
end

doit(1_000, 500)
# Schwern is faster than uniq by 19.999999999999996% ± 10.0% (results differ)
# uniq is similar to Cary
# Cary is faster than Mihael by 10.000000000000009% ± 10.0%
# Mihael is similar to sawa
# sawa is faster than Sharma by 5x ± 0.1

doit(100_000, 50_000)
# Schwern is faster than uniq by 50.0% ± 10.0%               (results differ)
# uniq is similar to Cary
# Cary is similar to Mihael
# Mihael is faster than sawa by 10.000000000000009% ± 10.0%
# sawa is faster than Sharma by 310x ± 10.0

"Schwern" and "uniq" return arrays containing the same elements but not in the same order (hence "results differ").

Here is the additional benchmark requested by @Schern.

def doit1(n)
  arr = n.times.map { rand(n/10) }
  compare do
    uniq     { arr.uniq } 
    Schwern  { uniq = []; arr.sort.each { |e| uniq.push(e) if e != uniq[-1]; uniq } }
    Sharma   {b = []; arr.each{ |aa| b << aa unless b.include?(aa) }; b }
    Mihael   { arr.to_set.to_a }
    sawa     { arr.group_by(&:itself).keys }
    Cary     { arr | [] }
  end
end

doit1(1_000)
# Cary is similar to uniq
# uniq is faster than sawa by 3x ± 1.0
# sawa is similar to Schwern                     (results differ)
# Schwern is similar to Mihael                   (results differ)
# Mihael is faster than Sharma by 2x ± 0.1

doit1(50_000)
# Cary is similar to uniq
# uniq is faster than Schwern by 2x ± 1.0        (results differ)
# Schwern is similar to Mihael                   (results differ)
# Mihael is similar to sawa
# sawa is faster than Sharma by 62x ± 10.0

Upvotes: 5

Schwern

Reputation: 165586

The code for most Ruby methods can be found in the ruby-doc.org API documentation. If you mouse over a method's documentation, a "click to toggle source" button appears. The code is in C, but it's very easy to understand.

if (RARRAY_LEN(ary) <= 1)
    return rb_ary_dup(ary);

if (rb_block_given_p()) {
    hash = ary_make_hash_by(ary);
    uniq = rb_hash_values(hash);
}
else {
    hash = ary_make_hash(ary);
    uniq = rb_hash_values(hash);
}

If there's one element, return it. Otherwise turn the elements into hash keys, turn the hash back into an array. By a documented quirk of Ruby hashes, "Hashes enumerate their values in the order that the corresponding keys were inserted", this technique preserves the original order of the elements in the Array. In other languages it may not.

Alternatively, use a Set. A set will never have duplicates. Loading set adds the method to_set to all Enumerable objects, which includes Arrays. However, a Set is usually implemented as a Hash so you're doing the same thing. If you want a unique array, and if you don't need the elements to be ordered, you should probably instead make a set and use that. unique = array.to_set

Alternatively, sort the Array and loop through it pushing each element onto a new Array. If the last element on the new Array matches the current element, discard it.

array = [2, 3, 4, 5, 1, 2, 4, 5];
uniq = []

# This copies the whole array and the duplicates, wasting
# memory.  And sort is O(nlogn).
array.sort.each { |e|
  uniq.push(e) if e != uniq[-1]
}

[1, 2, 3, 4, 5]
puts uniq.inspect

This method is to be avoided because it is slower and takes more memory than the other methods. The sort makes it slower. Sorting is O(nlogn) meaning as the array gets bigger sorting will get slower quicker than the array grows. It also requires you to copy the whole array, with duplicates, unless you want to alter the original data by sorting in place with sort!.

The other methods are O(n) speed and O(n) memory meaning they will scale linearly as the array gets bigger. And they don't have to copy the duplicates which can use substantially less memory.

Upvotes: 4

sawa

Reputation: 168269

array.group_by(&:itself).keys

......................

Upvotes: 2

Mihail Petkov

Reputation: 1545

You can use #to_set Read more about it here

Upvotes: 3

Remove duplicates in Ruby Array

Answers (5)

Related Questions