Why is uniq needed in this problem? (Ruby)

Question

New to coding, and trying to study independently. Working on a problem on Ruby, where we create a method (filter_out!) that mutates the original array to remove true elements when the array is passed through a proc. The only stipulation is that we aren't allowed to use Array#reject!.

Can anyone explain to me why

arr.uniq.each { |ele| arr.delete(ele) if prc.call(ele)}

works and

arr.each { |ele| arr.delete(ele) if prc.call(ele)}

does not? Here are two example problems:

arr_2 = [1, 7, 3, 5 ]

filter_out!(arr_2) { |x| x.odd? }

p arr_2     # []

arr_3 = [10, 6, 3, 2, 5 ]

filter_out!(arr_3) { |x| x.even? }

p arr_3     # [3, 5]

I've looked it up and understand #uniq removes duplicate values, right?

Any insight would be greatly appreciated-thank you!

edit: Looked into it more, is it because uniq will use the return value of the proc for comparison? Still not sure why this is needed for some numbers, but not all.

J&#246;rg W Mittag · Accepted Answer

The only stipulation is that we aren't allowed to use Array#reject!.

This is an incredibly braindead stipulation. You could just use Array#select! and invert the condition, or use Array#reject together with Array#replace, for example.

Can anyone explain to me why

arr.uniq.each { |ele| arr.delete(ele) if prc.call(ele)}

works and

arr.each { |ele| arr.delete(ele) if prc.call(ele)}

Whenever you don't quite understand what a piece of code does, a good idea is to just run it yourself with a pen and a piece of paper.

So, let's just do that. Assume arr = [1, 2, 4, 5, 2] and prc = -> e { e.even? }.

Array#each iterates through the array, we don't exactly know how it does it (that is the whole idea of abstraction), but we can imagine that it keeps some kind of index to remember which part of the array it is currently at.

So, during the first iteration of the array, the index is at the first element, and the array looks like this:

 [1, 2, 4, 5, 2]
# ↑

Array#each passes 1 to the block, which in turn passes it to prc, which returns false, thus the block doesn't do anything.

Array#each increases the index, so now our array looks like this:

 [1, 2, 4, 5, 2]
#    ↑

Array#each passes 2 to the block, which in turn passes it to prc, which returns true. As a result, the block now passes 2 to Array#delete, which deletes every element from the array that is equal to 2. So, now the array looks like this:

 [1, 4, 5]
#    ↑

Array#each increases the index, so now our array looks like this:

 [1, 4, 5]
#       ↑

Array#each passes 5 to the block, which in turn passes it to prc, which returns false, thus the block doesn't do anything.

Array#each increases the index, so now our array looks like this:

 [1, 4, 5]
#          ↑

Since index is past the end of the array, the iteration is done, and the result is [1, 4, 5].

As you can see, the problem is that you are mutating the array while iterating over it.

I've looked it up and understand #uniq removes duplicate values, right?

Correct, but that has nothing to do with why it makes your code work. Or, more precisely, makes it seem to work because there are actually a lot of other problems with your code as well.

It is not the fact that Array#uniq removes duplicate values that is relevant, it is the fact that Array#uniq returns a new array. Since Array#uniq returns a new array, you are no longer mutating an array while iterating over it, you are mutating one array while iterating over another.

You could have used any method of Array or one of its ancestors that returns a new array, for example Object#clone, Object#dup, Array#map, or Array#select, or even something really creative such as arr + []:

arr.select { true }.each {|ele| arr.delete(ele) if prc.call(ele) }

In fact, you don't even need to return an Array, you only need to return some kind of Enumerable, such as an Enumerator, for example using Array#each:

arr.each.each {|ele| arr.delete(ele) if prc.call(ele) }

edit: Looked into it more, is it because uniq will use the return value of the proc for comparison? Still not sure why this is needed for some numbers, but not all.

Since you are not passing the proc to Array#uniq, it cannot possibly use it for anything, so it is clearly impossible for this to be the explanation.

Note that, as I explained above, there are more problems with your code than just mutating an array while iterating over it.

Even if your original code did work, it would actually still be broken. You are using Array#delete to delete the element for which prc returns true. The problem is that Array#delete deletes all elements in the array that are #== to the element you are passing as an argument, even the ones for which the prc might return false.

Here is a trivial example:

a = [2, 2, 2]

i = -1

filter_out!(a) { (i += 1) == 1 }

This should only filter out the second element, but actually deletes all of them, so the result is [] when it actually should be [2, 2].

Your version with Array#uniq makes this problem even worse, because after running Array#uniq, the array doesn't even have a second element any more! So, the result is [2], when it actually should be [2, 2].

The next problem is the name of the method: filter_out!. Bang methods (i.e. methods whose name ends with an exclamation mark !) are used to mark the more surprising method of a pair of methods. You should never have a bang method on its own. The name filter_out! should only be used if and only if there is also a filter_out method. So, the method should be named filter_out.

And the last problem is that you are mutating the argument that is passed into the method. That is an absolute no-no. You never, ever mutate an argument that is passed into a method. Never. You never break somebody else's toys, you never touch somebody else's privates.

Mutation, in general, should be avoided as much as possible, since it can lead to confusing and hard to track down bugs, as you so beautifully have discovered yourself. Only if absolutely necessary should you mutate state, and only state that you yourself own. Never mutate somebody else's state. Instead, filter_out should return a new array with the results.

Here are a couple of examples what filter_out could look like:

def filter_out(enum)
  enum.select {|el| !yield el }
end

def filter_out(enum)
  enum.reduce([]) {|acc, el| if yield el then acc else acc + [el] end }
end

def filter_out(enum)
  enum.each_with_object([]) {|el, acc| acc << el unless yield el }
end

Note: Personally, I am not a big fan of Enumerable#each_with_object, because it relies on mutation, so I would avoid that solution as much as possible.

However, the real problem seems to be that whatever course or book or tutorial or class you are following seems to be pretty terrible, both a teaching Ruby, and at teaching good Software Engineering practices, such as testing and debugging.

Why is uniq needed in this problem? (Ruby)

Answers (2)

Related Questions