Reputation: 16099
I need to make 200 or so HTTP requests. I want them to run in parallel, or batches, and I'm not sure where to start for doing this in Clojure. pmap
appears to have the effect I want, for example, using http.async.client:
(defn get-json [url]
(with-open [client (http/create-client)]
(let [resp (http/GET client url)]
(try
(println 1)
(http/string (http/await resp))
(println "********DONE*********")
nil
(catch Exception e (println e) {})))))
music.core=> (pmap get-json [url url2])
1
1
********DONE*********
********DONE*********
(nil nil)
But I can't prove that the requests are actually executing in parallel. Do I need to call into the JVM's Thread APIs? I'm searching around and coming up with other libraries like Netty, Lamina, Aleph - should I be using one of these? Please just point me in the right direction for learning about the best practice/simplest solution.
Upvotes: 9
Views: 3217
Reputation: 6677
Take a look at Claypoole. Example code:
(require '[com.climate.claypoole :as cp])
;; Run with a thread pool of 100 threads, meaning up to 100 HTTP calls
;; will run simultaneously. with-shutdown! ensures the thread pool is
;; closed afterwards.
(cp/with-shutdown! [pool (cp/threadpool 100)
(cp/pmap pool get-json [url url2]))
The reason you should prefer com.climate.claypoole/pmap
over clojure.core/pmap
in this case is that the latter sets the number of threads based on the number of CPUs, with no way of overriding. For networking and other I/O operations that aren't CPU bound, you typically want to set the number of threads based on the desired amount of I/O, not based on CPU capacity.
Or use a non-blocking client like http-kit that doesn't require one thread per connection, as suggested by mikera.
Upvotes: 0
Reputation: 106351
Ideally you don't want to tie up a thread waiting for the result of each http request, so pmap
or other thread-based approaches aren't really a good idea.
What you really want to do is:
My suggested approach is to use http-kit to fire off all the asynchronous requests at once, producing a sequence of promises. You then just need to dereference all these promises in a single thread, which will block the thread until all results are returned.
Something like:
(require '[org.httpkit.client :as http])
(let [urls (repeat 100 "http://google.com") ;; insert your URLs here
promises (doall (map http/get urls))
results (doall (map deref promises))]
#_do_stuff_with_results
(first results))
Upvotes: 14
Reputation: 5619
What you're describing is a perfectly good use of pmap
and I'd approach it in similar fashion.
As far as 'proving' that it runs in parallel, you have to trust that each iteration of pmap
runs the function in a new thread. However a simple way to be certain is simply print the thread id as a sanity check:
user=> (defn thread-id [_] (.getId (Thread/currentThread)))
user=> (pmap thread-id [1 2 3])
(53 11 56)
As the thread numbers are in fact different - meaning clojure is creating a new thread each time - you can safely trust the JVM will run your code in parallel.
Also have a look at other parallel functions such as pvalues and pcalls. They give you different semantics and might be the right answer depending on the problem at hand.
Upvotes: 6