Reputation: 16099

Clojure - executing a bunch of HTTP requests in parallel - pmap?

I need to make 200 or so HTTP requests. I want them to run in parallel, or batches, and I'm not sure where to start for doing this in Clojure. pmap appears to have the effect I want, for example, using http.async.client:

(defn get-json [url]
    (with-open [client (http/create-client)]
        (let [resp (http/GET client url)]
            (try
                (println 1)
                (http/string (http/await resp))
                (println "********DONE*********")
                nil

            (catch Exception e (println e) {})))))


music.core=> (pmap get-json [url url2])
1
1
********DONE*********
********DONE*********
(nil nil)

But I can't prove that the requests are actually executing in parallel. Do I need to call into the JVM's Thread APIs? I'm searching around and coming up with other libraries like Netty, Lamina, Aleph - should I be using one of these? Please just point me in the right direction for learning about the best practice/simplest solution.

Upvotes: 9

Answers (3)

markusk

Reputation: 6677

Take a look at Claypoole. Example code:

(require '[com.climate.claypoole :as cp])
;; Run with a thread pool of 100 threads, meaning up to 100 HTTP calls
;; will run simultaneously. with-shutdown! ensures the thread pool is
;; closed afterwards.
(cp/with-shutdown! [pool (cp/threadpool 100)
  (cp/pmap pool get-json [url url2]))

The reason you should prefer com.climate.claypoole/pmap over clojure.core/pmap in this case is that the latter sets the number of threads based on the number of CPUs, with no way of overriding. For networking and other I/O operations that aren't CPU bound, you typically want to set the number of threads based on the desired amount of I/O, not based on CPU capacity.

Or use a non-blocking client like http-kit that doesn't require one thread per connection, as suggested by mikera.

Upvotes: 0

mikera

Reputation: 106351

Ideally you don't want to tie up a thread waiting for the result of each http request, so pmap or other thread-based approaches aren't really a good idea.

What you really want to do is:

Fire off all the requests asynchronously
Wait for the results with just one thread

My suggested approach is to use http-kit to fire off all the asynchronous requests at once, producing a sequence of promises. You then just need to dereference all these promises in a single thread, which will block the thread until all results are returned.

Something like:

(require '[org.httpkit.client :as http])

(let [urls (repeat 100 "http://google.com") ;; insert your URLs here
      promises (doall (map http/get urls))
      results (doall (map deref promises))]
  #_do_stuff_with_results 
  (first results))

Upvotes: 14

leonardoborges

Reputation: 5619

What you're describing is a perfectly good use of pmap and I'd approach it in similar fashion.

As far as 'proving' that it runs in parallel, you have to trust that each iteration of pmap runs the function in a new thread. However a simple way to be certain is simply print the thread id as a sanity check:

user=> (defn thread-id [_] (.getId (Thread/currentThread)))

user=> (pmap thread-id [1 2 3])

(53 11 56)

As the thread numbers are in fact different - meaning clojure is creating a new thread each time - you can safely trust the JVM will run your code in parallel.

Also have a look at other parallel functions such as pvalues and pcalls. They give you different semantics and might be the right answer depending on the problem at hand.

Upvotes: 6

Clojure - executing a bunch of HTTP requests in parallel - pmap?

Answers (3)

Related Questions