Reputation: 106351
I need to very efficiently compare two maps in Clojure/Java, and return the difference as determined by Java's .equals(..), with nil/null equivalent to "not present".
i.e. I am looking for the most efficient way to a write a function like:
(map-difference
{:a 1, :b nil, :c 2, :d 3}
{:a 1, :b "Hidden", :c 3, :e 5})
=> {:b nil, :c 2, :d 3, :e nil}
I'd prefer an immutable Clojure map as output, but a Java map would also be fine if the performance improvement would be significant.
For what it's worth, my basic test case / expectation of behaviour is that the following will be equal (up to the equivalence of null = "Not present") for any two maps a and b:
a
(merge b (difference a b))
What would be the best way to implement this?
Upvotes: 18
Views: 12370
Reputation: 25096
What about...
(defn map-diff [m1 m2]
;; m1: hashmap
;; m2: hashmap
;; => the difference between them
(reduce merge
(map #(hash-map % (- (or (% m1) 0) (or (% m2) 0)))
(keys (merge m1 m2)))))
Upvotes: 0
Reputation: 27677
You could also just use Maps.difference(..) method from Google's Guava libraries
Upvotes: 2
Reputation: 84331
I'm not sure what the absolutely most efficient way to do this is, but here's a couple of things which may be useful:
The basic expectation of behaviour from the question text is impossible: if a
and b
are maps such that b
contains at least one key not present in a
, (merge b <sth>)
cannot be equal to a
.
If you end up going with an interop solution but then need to go back to a PersistentHashMap
at some point, there's always
(clojure.lang.PersistentHashMap/create
(doto (java.util.HashMap.)
(.put :foo 1)
(.put :bar 2)))
; => {:foo 1 :bar 2}
If you need to pass the keyset of a Clojure map to a Java method, you can use
(.keySet {:foo 1 :bar 2})
; => #< [:foo, :bar]>
If all keys involved are guaranteed to be Comparable
, this could be exploited for efficient computation of difference
on maps with many keys (sort & merge scan). For unconstrained keys this is of course a no-go and for small maps it could actually hurt performance.
It's good to have a version written in Clojure, if only to set a baseline performance expectation. Here is one: (updated)
(defn map-difference [m1 m2]
(loop [m (transient {})
ks (concat (keys m1) (keys m2))]
(if-let [k (first ks)]
(let [e1 (find m1 k)
e2 (find m2 k)]
(cond (and e1 e2 (not= (e1 1) (e2 1))) (recur (assoc! m k (e1 1)) (next ks))
(not e1) (recur (assoc! m k (e2 1)) (next ks))
(not e2) (recur (assoc! m k (e1 1)) (next ks))
:else (recur m (next ks))))
(persistent! m))))
I think that just doing (concat (keys m1) (keys m2))
and possibly duplicating some work is likely more efficient most of the time than checking a given key is in "the other map" too at every step.
To wrap up the answer, here's a very simple-minded set-based version with the nice property that it says what it does -- if I misunderstood the spec, it should be readily apparent here. :-)
(defn map-difference [m1 m2]
(let [ks1 (set (keys m1))
ks2 (set (keys m2))
ks1-ks2 (set/difference ks1 ks2)
ks2-ks1 (set/difference ks2 ks1)
ks1*ks2 (set/intersection ks1 ks2)]
(merge (select-keys m1 ks1-ks2)
(select-keys m2 ks2-ks1)
(select-keys m1
(remove (fn [k] (= (m1 k) (m2 k)))
ks1*ks2)))))
Upvotes: 12
Reputation: 10850
I am not sure about its performance
(defn map-difference
[orig other]
(let [changed (set/difference (set orig) (set other))
added (set/difference (set (keys other)) (set (keys orig)))]
(reduce (fn [acc key]
(assoc acc key :missing))
(into {} changed)
added)))
I used :missing
key to avoid ambiguity between a nil
value in the original map, and a missing key -- you can of course change it to nil
.
Upvotes: 3
Reputation: 328556
Use the built-in collections API:
Set<Map.Entry<K,V>> difference = a.entrySet().removeAll(b.entrySet());
If you need to convert that back into a map, you must iterate. In that case, I suggest:
Map<K,V> result = new HashMap<K,V>(Math.max(a.size()), b.size()));
Set<Map.Entry<K,V>> filter = b.entrySet();
for( Map.Entry<K,V> entry : a.entrySet ) {
if( !filter.contains( entry ) {
result.put(entry.getKey(), entry.getValue());
}
}
Upvotes: 3
Reputation: 5881
Clojure maps will be fine because reading clojure maps is very fast.
I can't answer you but I can give you something to look at. Brenton Ashworth made a testtool where he solved the problem with map compares. Maybe you can look at his code to get hint for implementation. http://formpluslogic.blogspot.com/2010/07/better-clojure-test-results-with-deview.html and http://github.com/brentonashworth/deview
I don't think there is a better implementation that compare the keys and look up if the are different.
Upvotes: 3
Reputation: 2333
In Java, Google Commons Collections offer a quite performant solution.
Upvotes: 6