I would like to Parallelize my Clojure implementation

Question

Ok so i have an algorithm what it does is , it loops through a fill line by line and then looks for a given word in the line. Not only does it return the given word but it also returns a number(given also as a parameter) of words that come before and after that word.

Eg.line = "I am overflowing with blessings and you also are"
           parameters = ("you" 2)
           output = (blessings and you also are)

(with-open [r (clojure.java.io/reader "resources/small.txt")]
  (doseq [l (line-seq r)]
    (let [x (topMostLoop l "good" 2)]
      (if (not (empty? x))
        (println x)))))

the above code is working fine. But i would like to parallelize it so i did this below

(with-open [r (clojure.java.io/reader "resources/small.txt")]
  (doseq [l (line-seq r)]
    (future
      (let [x (topMostLoop l "good" 2)]
        (if (not (empty? x))
          (println x))))))

but then the outputs comes out all messy. I know I need to lock somewhere but dont know where.

(defn topMostLoop [contents word next]
  (let [mywords (str/split contents #"[ ,\.]+")]
    (map (fn [element] (
                        return-lines (max 0 (- element next))
                        (min (+ element next) (- (count mywords) 1)) mywords))
         (vec ((indexHashMap mywords) word)))))

Please would be glad if someone can help me this is the last thing Im left with.

NB. Do let me know if i need to post the other functions as well

I have added the other functions for more clarity

(defn return-lines [firstItem lastItem contentArray]
  (take (+ (- lastItem firstItem) 1) 
        (map (fn [element] (str element))
             (vec (drop firstItem contentArray)))))

(defn indexHashMap [mywords]
  (->> (zipmap (range) mywords)     ;contents is a list of words
       (reduce (fn [index [location word]]
                 (merge-with concat index {word (list location)})) {})))

Bojan Horvat · Accepted Answer

First, use map for first example when you are using serial approach:

(with-open [r (clojure.java.io/reader "resources/small.txt")]
  (doseq [l (map #(topMostLoop %1 "good" 2) (line-seq r))]
    (if (not (empty? l))
        (println l))))

With this approach topMostLoop function is applied on each line, and lazy seq of results is returned. In body of doseq function results are printed if not empty.

After that, replace map with pmap, which will run mapping in parallel, and results will appear in same order as given lines:

(with-open [r (clojure.java.io/reader "resources/small.txt")]
  (doseq [l (pmap #(topMostLoop %1 "good" 2) (line-seq r))]
    (if (not (empty? l))
        (println l))))

In your case with futures, results will be normaly out of order (some later futures will finish execution sooner than former futures).

I tested this with following modifications (not reading text file, but creating lazy sequence of vector of numbers, searching for value in vectors and returning surrounding):

(def lines (repeatedly #(shuffle (range 1 11))))
(def lines-10 (take 10 lines))

lines-10
([5 8 3 10 6 9 7 2 1 4]
[6 8 9 7 2 5 10 4 1 3]
[2 7 8 9 1 5 10 3 4 6]
[10 8 3 5 7 2 4 9 6 1]
[8 6 10 1 9 4 3 7 2 5]
[9 6 8 1 5 10 3 4 2 7]
[10 9 3 7 1 8 4 6 5 2]
[6 1 4 10 3 7 8 9 5 2]
[9 6 7 5 8 3 10 4 2 1]
[4 1 5 2 7 3 6 9 8 10])

(defn surrounding
 [v value size]
  (let [i (.indexOf v value)]
   (if (= i -1)
    nil
    (subvec v (max (- i size) 0) (inc (min (+ i size) (dec (count v))))))))

(doseq [l (map #(surrounding % 3 2) lines-10)] (if (not (empty? l)) (println l)))
[5 8 3 10 6]
[4 1 3]
[5 10 3 4 6]
[10 8 3 5 7]
[9 4 3 7 2]
[5 10 3 4 2]
[10 9 3 7 1]
[4 10 3 7 8]
[5 8 3 10 4]
[2 7 3 6 9]
nil

(doseq [l (pmap #(surrounding % 3 2) lines-10)] (if (not (empty? l)) (println l)))
[5 8 3 10 6]
[4 1 3]
[5 10 3 4 6]
[10 8 3 5 7]
[9 4 3 7 2]
[5 10 3 4 2]
[10 9 3 7 1]
[4 10 3 7 8]
[5 8 3 10 4]
[2 7 3 6 9]
nil

I would like to Parallelize my Clojure implementation

Answers (1)

Related Questions