Reputation: 5928

processing array with duplicates

I have an array

a = ['A', 'B', 'B', 'C', 'D', 'D']

and I have to go thru all the elements, do something depending on whether the is the last occurance or not, and remove the element after processing it.

The elements are already sorted if that matters.

I'm looking for something efficient. Any suggestions?

Her what I have until now. THIS WORKS AS EXPECTED but not sure it is very efficient.

    a = ['A', 'B', 'B', 'C', 'D', 'D']

while !a.empty?
  b = a.shift

  unless a.count(b) > 0
    p "unique #{b}"
  else
    p "duplicate #{b}"
  end
end

and it produces

"unique A"
"duplicate B"
"unique B"
"unique C"
"duplicate D"
"unique D"

Thanks

Upvotes: 1

Answers (4)

Cary Swoveland

Reputation: 110675

The OP stated that the elements of a are sorted, but that is not required by the method I propose. It also maintains array-order, which could be important for the "do something" code performed for each element to be removed. It does so with no performance penalty over the case where the array is already sorted.

For the array

['A', 'B', 'D', 'C', 'B', 'D']

I assume that some code is to be executed for 'A', 'C' the second 'B' and the second 'D', in that order, after which a new array

['B', 'D']

is returned.

Code

def do_something(e) end

def process_last_dup(a)    
  a.dup.
    tap do |b|
      b.each_with_index.
        reverse_each.
        uniq(&:first).
        reverse_each { |_,i| do_something(a[i]) }.
        each { |_,i| b.delete_at(i) }
    end
end

Example

a = ['A', 'B', 'B', 'C', 'D', 'D']

process_last_dup(a)
  #=> ["B", "D"]

Explanation

The steps are as follows.

b = a.dup
  #=> ["A", "B", "B", "C", "D", "D"]
c = b.each_with_index
  #=> #<Enumerator: ["A", "B", "B", "C", "D", "D"]:each_with_index>
d = c.reverse_each
  #=> #<Enumerator: #<Enumerator: ["A",..., "D"]:each_with_index>:reverse_each>

Notice that d can be thought of as a "compound" enumerator. We can convert it to an array to see the elements it will generate and pass to uniq.

d.to_a
  #=> [["D", 5], ["D", 4], ["C", 3], ["B", 2], ["B", 1], ["A", 0]]

Continuing,

e = d.uniq(&:first)
  #=> [["D", 5], ["C", 3], ["B", 2], ["A", 0]]
e.reverse_each { |_,i| do_something(a[i]) }

reverse_each is used so that do_something is first executed for 'A', then for the second 'B', and so on.

e.each { |_,i| b.delete_at(i) }
b #=> ["B", "D"]

If a is to be modified in place replace a.dup. with a..

Readers may have noticed that the code I gave at the beginning used Object#tap so that tap's block variable b, which initially equals a.dup, will be returned after it has been modified within tap's block, rather than explicitly setting b = a.sup at the beginning and b at the end, as I've done in my step-by-step explanation. Both approaches yield the same result, of course.

The doc for Enumerable#uniq does not specify whether the first element is kept, but it does reference Array.uniq, which does keep the first. If there is any uneasiness about that one could always replace reverse_each with reverse so that Array.uniq would be used.

Upvotes: 1

sschmeck

Reputation: 7685

Quite similar to the answer of @GaganGami but using chunk_while.

a.chunk_while { |a,b| a == b }
 .each do |*list,last|
   list.each { |e| puts "duplicate #{e}" }
   puts "unique #{last}"
 end

chunk_whilesplits the array into sub arrays when the element changes.

['A', 'B', 'B', 'C', 'D', 'D'].chunk_while { |a,b| a == b }.to_a
# => [["A"], ["B", "B"], ["C"], ["D", "D"]]

Upvotes: 1

Gagan Gami

Reputation: 10251

Simple way:

array = ["A", "B", "B", "C", "D", "D"]

array.group_by{|e| e}.each do |key,value| 
  *duplicate,  uniq = value
  duplicate.map do |e|
    puts "Duplicate #{e}"
  end
  puts "Unique #{uniq}"
end

As per Stefan's comment and suggestion, shorter way is:

array.chunk_while(&:==).each do |*duplicate, uniq|
  duplicate.map do |e|
    puts "Duplicate #{e}"
  end
  puts "Unique #{uniq}"
end


# Above both will give the same Output:
---------------------------------------
Unique A
Duplicate B
Unique B
Unique C
Duplicate D
Unique D

Upvotes: 4

user94559

Reputation: 60143

Based on your code and expected output, I think this is an efficient way to do what you're looking for:

a = ['A', 'B', 'B', 'C', 'D', 'D']

a.each_index do |i|
  if i < a.length - 1 && a[i+1] == a[i]
    puts "This is not the last occurrence of #{a[i]}"
  else
    puts "This is the last occurrence of #{a[i]}"
  end
end

# Output:
# This is the last occurrence of A
# This is not the last occurrence of B
# This is the last occurrence of B
# This is the last occurrence of C
# This is not the last occurrence of D
# This is the last occurrence of D

But I want to reiterate the importance of the wording in my output versus yours. This is not about whether the value is unique or not in the input. It seems to be about whether the value is the last occurrence within the input or not.

Upvotes: 1

processing array with duplicates

Answers (4)

Related Questions