Doug
Doug

Reputation: 35136

How do you find the intersection of sets of maps by key in clojure?

This function takes a set of vectors of numbers and returns the intersection of them:

(defn foo [& sets] (apply clojure.set/intersection (map #(set %) sets)))

eg.

user=> (foo [1 2 3] [3 4 5 1] [33 3 3 1])
#{1 3}

How can you achieve the same sort of thing, given the elements of the input sets are maps with an :id key to use as the unique property of the map?

ie. I'm trying to write a function foo2 that does this:

user=> (foo2 [{:id 1 ...} {:id 2 ...} {:id 3 ...}] 
             [{:id 3 ...} {:id 4 ...} {:id 5 ...} {:id 1 ...}]
             [{:id 33 ...} {:id 3 ...} {:id 3 ...} {:id 1 ...}])
[{:id 1 ...} {:id 3 ...}]

Is there an easy way to do that?

Or is the only way to collect the ids, get the intersection of the keys and then lookup the keys in the input sets?

Is there any way to 'overload' the set functions (union, intersection, etc) by defining a 'key-value' function that returns the key for an arbitrary object, or is it only ever possible to use sets with primitive values?

Upvotes: 3

Views: 1181

Answers (2)

ProjectFrank
ProjectFrank

Reputation: 642

EDIT: I previously incorrectly assumed maps with the same :id are guaranteed to have the same keys and values. If they are allowed to be different, this is a better answer:

(defn intersect-by [key-fn data]
  (let [normalized (map (fn [map-data]
                          (->> map-data
                               (group-by key-fn)
                               (map (fn [[key val]]
                                      [key (apply merge val)]))
                               (into {})))
                        data)
        normalized-keys (map (comp set keys) normalized)
        intersection (apply clojure.set/intersection normalized-keys)
        merged-data (apply merge-with merge normalized)]
    (vals (select-keys merged-data intersection))))
#'user/foo

(def data [[{:id 1} {:id 2} {:id 3, :x 3}]
           [{:id 3, :y 4} {:id 4} {:id 5} {:id 1}]
           [{:id 33} {:id 3} {:id 3, :z 5} {:id 1}]])
#'user/data

(intersect-by :id data)
({:id 1} {:id 3, :x 3, :y 4, :z 5})

If you assume two maps with the same :id are guaranteed to have the same keys and values, your first foo function does this already:

(def data [[{:id 1} {:id 2} {:id 3}]
           [{:id 3} {:id 4} {:id 5} {:id 1}]
           [{:id 33} {:id 3} {:id 3} {:id 1}]])
#'user/data

(defn foo [& sets] (apply clojure.set/intersection (map #(set %) sets)))
#'user/foo

(apply foo data)
#{{:id 1} {:id 3}}

Upvotes: 2

leetwinski
leetwinski

Reputation: 17859

i would do almost the same: clojure.set/intersection, after converting every vector to sorted set by :id:

user> (defn intersect [& colls]
        (apply clojure.set/intersection
               (map #(apply sorted-set-by
                            (fn [{id1 :id} {id2 :id}] (compare id1 id2))
                            %)
                    colls)))
#'user/intersect

user> (intersect [{:id 1} {:id 2} {:id 3}] 
                 [{:id 3} {:id 4} {:id 5} {:id 1}]
                 [{:id 33} {:id 3} {:id 3} {:id 1}])
#{{:id 1} {:id 3}}

notice that this approach would treat different maps with the same :id value as equal, but i guess that's what you need. By the way, sorted-set is exactly the way to "overload" set functions that you are asking about.

you can make a helper function to make it look more generic:

user> (defn key-comparator [key]
        #(compare (key %1) (key %2)))
#'user/key-comparator

user> (defn intersect [& colls]
        (apply clojure.set/intersection
               (map #(apply sorted-set-by (key-comparator :id) %)
                    colls)))
#'user/intersect

user> (intersect [{:id 1} {:id 2} {:id 3 :x 1}] 
                 [{:id 3 :x 10} {:id 4} {:id 5} {:id 1}]
                 [{:id 33} {:id 3 :x 33} {:id 3 :x 2} {:id 1}])
#{{:id 1} {:id 3, :x 33}}

another variant is to filter all the items from the first coll, whose id is present in all the other colls:

user> (defn intersect-2 [c & cs]
        (if (empty? cs) c
            (filter (fn [{id :id :as itm}]
                      (every? (partial some #(when (= id (:id %)) %))
                              cs))
                    c)))
#'user/intersect-2

user> (intersect-2 [{:id 1} {:id 2} {:id 3 :x 1}] 
                   [{:id 3 :x 10} {:id 4} {:id 5} {:id 1}]
                   [{:id 33} {:id 3 :x 33} {:id 3 :x 2} {:id 1}])
({:id 1} {:id 3, :x 1})

Upvotes: 2

Related Questions