Reputation: 65
I have a map that is sorted by its keys which contains data like this:
(def h {50 Text1
70 Text2
372 Text1
391 Text2
759 Text1
778 Text2
})
The map is sorted by Keys. The key (the number) can be interpreted as the position where the corresponding value was found in a large block of text. In the above example, "Text1" was found at position 50 in the text.
Now, I want to find all Texts that were found within k positions of each other. I define a function a like this:
(defn nearest [m k]
(for [m1 (keys m) m2 (keys m)
:when (and (> m2 m1) (not= (m m1) (m m2)) (< (- m2 m1) k))]
[m1 (get m m1) m2 (get m m2)]))
(nearest h 50)
; [[50 "Text1" 70 "Text2"] [372 "Text1" 391 "Text2"] [759 "Text1" 778 "Text2"]]
This works, but is too slow when the map m has 100s of thousands of elements. Because the for loop actually looks at all pairs of elements in the map. Since the map is sorted, for each element in the map, it is not necessary to check further elements, once the next element is already beyond k characters. I was able to write a version using loop and recur. But it is kind of unreadable. Is there a more natural way to do this using for? I am assuming for (:while ) should do the trick, but was not able to find a way.
(defn nearest-quick [m k]
(let [m1 (keys m) m2 (keys m)]
(loop [inp m res [] i (first m1) m1 (rest m1) j (first m2) m2 (rest m2)]
(cond
(nil? i) res
(nil? j)(recur inp res (first m1) (rest m1) j m2)
(= i j) (recur inp res i m1 (first m2) (rest m2))
(< j i) (recur inp res i m1 (first m2) (rest m2))
(= (inp i) (inp j)) (recur inp res i m1 (first m2) (rest m2))
(< (- j i) k) (recur inp (conj res [i (inp i) j (inp j)]) i m1 (first m2) (rest m2))
(>= (- j i) k) (recur inp res (first m1) (rest m1) (first (rest m1)) (rest (rest m1)))))))
Note: with a map with 42K elements, the first version takes 90 mins and the second version takes 3 mins.
Upvotes: 4
Views: 113
Reputation: 17299
One could probably exploit subseq
when the map is a sorted-map.
(defn nearest
[m n]
(for [[k v] m
[nk nv] (subseq m < k < (+ k n))
:when (not= v nv)]
[k v nk nv]))
Code not benchmarked.
Upvotes: 6
Reputation: 34840
Clojure's for
also has a :while
modifier, so you can stop the iteration with a condition.
Upvotes: 2
Reputation: 33657
From whatever I have understood from you example:
(def h (sorted-map 50 "Text1"
70 "Text2"
372 "Text1"
391 "Text2"
759 "Text1"
778 "Text2"))
(->> (map #(-> [%1 %2]) h (rest h))
(filter (fn [[[a b] [x y]]] (< (- x a) 50)))
(map flatten))
Upvotes: 0