Mauricio Estevez
Mauricio Estevez

Reputation: 417

Better Sequence Duplicate Remover

I made this function to remove consecutive duplicates, but I wanted to know if there was a better or shorter way to express it using distinct or something like that.

(defn duplicates
  [s]
  (reduce 
    #(if-not (= (last %1) %2) 
      (conj %1 %2) %1) 
    [] s))

Upvotes: 4

Views: 1498

Answers (2)

Leon Grapenthin
Leon Grapenthin

Reputation: 9266

clojure-1.7.0-alpha1 has a correct version of this function under the name dedupe.

The one you quoted returns its input sequence without consecutive duplicates. (Almost certainly) unwittingly, it also swallows all successive nil values if they begin the input sequence.

#(if-not (= (last %1) %2)
    (conj %1 %2)
    %1)

The lambda to reduce says: If the last element of the accumulator (%1) is unequal to the next input element (%2), add it to the accumulator, otherwise return the accumulator.

Because (last []) evaluates to nil it will never add nil values while the accumulator is empty. I leave fixing that as an exercise to the reader:

Make sure that duplicates returns the expected result [nil true nil] for input [nil nil true true nil].

Note: When operating with a vector, using peek performs significantly better than last.

EDIT (Since you edited your question): distinct returns each value of the input sequence only once. Unlike set it returns lazy-sequence.

A more idiomatic way to write duplicates/dedupe is the one that A. Webb posted as a comment (since it is also lazy). Otherwise, fixing the lambda to work correctly with an empty accumulator as its input and using peek instead of last would be more idiomatic.

Instead of fixing the lambda, in clojure-1.7.0-alpha1 you would use the dedupe transducer for eager evaluation, e. g.:

(into [] (dedupe) [nil nil true true nil])
-> [nil true nil] 

Or for lazy evaluation:

(dedupe [nil nil true true nil])
-> (nil true nil)

Upvotes: 7

Diego Basch
Diego Basch

Reputation: 13069

The anonymous function inside the reduce conjs an element to a sequence if it's different from the last element. You could rewrite it like this:

(defn maybe-conj [s e]
  (if (= (last s) e)
     s
     (conj s e)))

Then you could rewrite compress-sequence as:

(defn compress-sequence [s]
  (reduce maybe-conj [] s))

What this will do is go through each element of a sequence, and append it to the initial empty vector only if it's different from the last currently in that vector. The output will be a vector of the initial sequence with any runs removed. Example:

(compress-sequence [1 1 1 1 1 2 3 4 5 5 5 5 6 6 6 5 6 6 7 8 9])

evaluates to

[1 2 3 4 5 6 5 6 7 8 9]

Upvotes: 4

Related Questions