Taylor Lapeyre
Taylor Lapeyre

Reputation: 105

Removing adjacent and equal elements in a collection

Say that I have a function:

(defn get-token [char]

  (defn char->number? []
    (re-matches #"\d" (str char)))

  (defn whitespace? []
    (or
      (= \space char)
      (= \newline char)))

  (defn valid-ident-char? []
    (re-matches #"[a-zA-Z_$]" (str char)))

  (cond
    (whitespace?)
    nil

    (= \' char)
    :quote

    (= \) char)
    :rparen

    (= \( char)
    :lparen

    (char->number?)
    :integer

    (valid-ident-char?)
    :identifier

    :else
    (throw (Exception. "invalid character"))))

When I run this function on, for instance, the string "(test 1 2)", I get a list of symbols for each character:

'(:lparen :identifier :identifier :identifier nil :integer nil :integer :rparen)

Seeing that this is not entirely what I want, I am trying to write a function that takes a collection and "condenses" the collection to combine adjacent elements that are equal.

A final example might do this:

(defn combine-adjacent [coll]
  implementation...)

(->>
  "(test 1 2)"
  (map get-token)
  (combine-adjacent)
  (remove nil?))

; => (:lparen :identifier :integer :integer :rparen)

What is the idiomatic Clojure way to achieve this?

Upvotes: 1

Views: 229

Answers (3)

Michał Marczyk
Michał Marczyk

Reputation: 84331

Clojure 1.7 will introduce a new function called dedupe to accomplish exactly this:

(dedupe [0 1 1 2 2 3 1 2 3])
;= (0 1 2 3 1 2 3)

If you're prepared to use 1.7.0-alpha2, you could use it today.

The implementation relies on transducers (dedupe produces a transducer when called with no arguments; the unary overload is defined simply as (sequence (dedupe) coll)), so it wouldn't be straightforward to backport.

Upvotes: 3

noisesmith
noisesmith

Reputation: 20194

There are a couple of tricks for comparing items to the one next door:

first, we can compare it to its tail:

(defn combine-adjacent
  [s]
  (mapcat #(when (not= % %2) [%]) (rest s) s))

alternatively, we can take the sequence by twos, and drop repeats

(defn combine-adjacent
  [s]
  (mapcat (fn [[a b]] (if (not= a b) [a]) ())
          (partition 2 1[0 1 2 2 2 3 2 3])))

both of these take advantage of the helpful property of concat when combined with map that you can return zero or more elements for the result sequence for each input. The empty list for the false case in the second version is not needed, but may help with clarity.

Upvotes: 1

soulcheck
soulcheck

Reputation: 36767

Not sure how idiomatic it is but one way to do it would be to use partition-by to group elements of the incoming sequence into lists containing subsequences of the same element and then use map to get the first element from each of those lists.

So, in code

(defn combine-adjacent [input] 
     (->> input (partition-by identity) (map first)))

or

(defn combine-adjacent [input] 
    (->> (partition-by identity input) (map first))

should work.

Upvotes: 1

Related Questions