Reputation: 3630
What is the idiomatic clojure way to remove strings from an array of strings if there is case-insensitive match?
I need to preserve the case for the results (I always want to preserve the first occurence of insensitive match).
Simple example:
(distinct-case-insensitive ["fish" "Dog" "cat"] ["FISH "DOG"])
would return
["fish" "Dog" "cat"]
Upvotes: 1
Views: 1601
Reputation: 603
A solution is to implement a distinct-by
that allows to specify a function to apply on each element before checking the duplicates
(defn distinct-by [f coll]
(let [groups (group-by f coll)]
(map #(first (groups %)) (distinct (map f coll)))))
For the example case this can be used like
(distinct-by clojure.string/lower-case
(concat ["fish" "Dog" "cat"] ["FISH" "DOG"]))
; => ("fish" "Dog" "cat")
Upvotes: 0
Reputation: 2600
Here's a solution that meets your requirements (the first matching item "wins" and order is preserved), is lazy, and has the benefit of being a higher order function. It takes a keyfn
as its first argument, in correspondence with e.g. sort-by
and group-by
.
(defn distinct-by [keyfn coll]
(letfn [(step [xs seen]
(lazy-seq
((fn [xs]
(when-let [[x & more] (seq xs)]
(let [k (keyfn x)]
(if (seen k)
(recur more)
(cons x (step more (conj seen k)))))))
xs)))]
(step coll #{})))
So your usage would be:
(require '[clojure.string :as str])
(distinct-by str/lower-case ["fish" "Dog" "cat" "Fish" "DOG"])
;=> ("fish" "Dog" "cat")
The use of recur
and the inner anonymous function is a relatively minor optimization. clojure.core/distinct
uses it but in many cases it's not necessary. Here's a version without the extra noise:
(defn distinct-by [keyfn coll]
(letfn [(step [xs seen]
(lazy-seq
(when-let [[x & more] (seq xs)]
(let [k (keyfn x)]
(if (seen k)
(step more seen)
(cons x (step more (conj seen k))))))))]
(step coll #{})))
Upvotes: 1
Reputation: 51480
Obviously, you can't use build-in distinct here, so you should reimplement it yourself.
mishadoff's solution is really beautiful and clujur'ish, but it breaks the order of elements when there are more then 8 unique elements dye to clojure HashMap implementation.
The safest way to do what you want is to use reduce
:
(defn concat-distinct [& colls]
(first
(reduce (fn [[coll seen] el]
(let [lc-el (string/lower-case el)]
(if (contains? seen lc-el)
[coll seen]
[(conj coll el) (conj seen lc-el)])))
[[] #{}]
(apply concat colls))))
If works for any number of collections:
user=> (concat-distinct ["fish" "Dog" "cat"] ["FISH" "DOG"] ["snake"] ["CaT" "Camel"])
["fish" "Dog" "cat" "snake" "Camel"]
And for any number of distinct elements (unlike mishadoff's solution):
user=> (concat-distinct ["g" "h" "i" "j" "a" "b" "c" "d" "e" "f"])
["g" "h" "i" "j" "a" "b" "c" "d" "e" "f"]
In most cases you'll be fine with greedy solution. But if you want it to be lazy, then you won't be able to avoid recursion:
(defn lazy-concat-distinct [& colls]
((fn step [coll seen]
(lazy-seq
(loop [[el & xs :as s] coll]
(when (seq s)
(let [lc-el (string/lower-case el)]
(if (contains? seen lc-el)
(recur xs)
(cons el (step xs (conj seen lc-el)))))))))
(apply concat colls) #{}))
This solution uses lazy sequences:
user=> (def res (lazy-concat-distinct (lazy-seq (println :boo) ["boo"])))
user=> (count res)
:boo
1
You can make it even lazier using lazy-cat
macro:
(defmacro lazy-concat-distinct* [& colls]
`(lazy-concat-distinct (lazy-cat ~@colls)))
Now it won't even evaluate it's arguments until they are actually used:
user=> (def res (lazy-concat-distinct* (do (println :boo) ["boo"])))
user=> (count res)
:boo
1
It's useful when you want to aggregate data from some large database without downloading it all at once.
N.B. Be careful with lazy solutions. For example, this solution works almost 4 times slower than the greedy one.
Upvotes: 3
Reputation: 10789
This is solution I came up with. To simplify function it accepts just one list with duplicates, so if you need vararg lists (apply concat lists)
before.
(defn distinct-case-insensitive [xs]
(->> xs
(group-by clojure.string/lower-case)
(vals)
(map first)))
(distinct-case-insensitive ["fish" "Dog" "cat" "Fish" "DOG"]) =>
("fish" "Dog" "cat")
But, as Leonid mentioned it does not preserve order due to hashmap. For ordered solution use
(defn distinct-case-insesitive [xs]
(->> xs
(group-by clojure.string/lower-case)
(#(map % (map clojure.string/lower-case xs)))
(map first)
(distinct)))
Upvotes: 5