Reputation: 65

Idiomatic way to use for, while still maintaining high performance

I have a map that is sorted by its keys which contains data like this:

    (def h {50 Text1
    70 Text2
    372 Text1
    391 Text2
    759 Text1
    778 Text2
    })

The map is sorted by Keys. The key (the number) can be interpreted as the position where the corresponding value was found in a large block of text. In the above example, "Text1" was found at position 50 in the text.

Now, I want to find all Texts that were found within k positions of each other. I define a function a like this:

     (defn nearest [m k]
         (for [m1 (keys m) m2 (keys m)
              :when (and (> m2 m1) (not= (m m1) (m m2)) (< (- m2 m1) k))]
              [m1 (get m m1)  m2 (get m m2)]))

     (nearest h 50)
     ; [[50 "Text1" 70 "Text2"] [372 "Text1" 391 "Text2"] [759 "Text1" 778 "Text2"]]

This works, but is too slow when the map m has 100s of thousands of elements. Because the for loop actually looks at all pairs of elements in the map. Since the map is sorted, for each element in the map, it is not necessary to check further elements, once the next element is already beyond k characters. I was able to write a version using loop and recur. But it is kind of unreadable. Is there a more natural way to do this using for? I am assuming for (:while ) should do the trick, but was not able to find a way.

(defn nearest-quick [m k]
      (let [m1 (keys m) m2 (keys m)]
        (loop [inp m res []  i (first m1) m1 (rest m1) j (first m2) m2 (rest m2)]
          (cond
            (nil? i) res
            (nil? j)(recur inp res (first m1) (rest m1) j m2)
            (= i j) (recur inp res i m1 (first m2) (rest m2))
            (< j i) (recur inp res i m1 (first m2) (rest m2))
            (= (inp i) (inp j)) (recur inp res i m1 (first m2) (rest m2))
            (< (- j i) k) (recur inp (conj res [i (inp i) j (inp j)]) i m1 (first m2) (rest m2))
            (>= (- j i) k) (recur inp res (first m1) (rest m1) (first (rest m1)) (rest (rest m1)))))))

Note: with a map with 42K elements, the first version takes 90 mins and the second version takes 3 mins.

Upvotes: 4

Answers (3)

kotarak

Reputation: 17299

One could probably exploit subseq when the map is a sorted-map.

(defn nearest
  [m n]
  (for [[k v]   m
        [nk nv] (subseq m < k < (+ k n))
        :when (not= v nv)]
    [k v nk nv]))

Code not benchmarked.

Upvotes: 6

Michiel Borkent

Reputation: 34840

Clojure's for also has a :while modifier, so you can stop the iteration with a condition.

Upvotes: 2

Ankur

Reputation: 33657

From whatever I have understood from you example:

(def h (sorted-map 50 "Text1"
                   70 "Text2"
                   372 "Text1"
                   391 "Text2"
                   759 "Text1"
                   778 "Text2"))


(->> (map #(-> [%1 %2]) h (rest h))
     (filter (fn [[[a b] [x y]]] (< (- x a) 50)))
     (map flatten))

Upvotes: 0

Idiomatic way to use for, while still maintaining high performance

Answers (3)

Related Questions