Reputation: 180
I want to learn Haskell and I have another small project (currently in Elixir) that I'd like to port as an exercise. It is a simple web scraper that scrapes a list of urls.
Imagine having a list of zip codes, around 2500 items. For each entry, a web page should be scraped, in the form of http://www.acme.org/zip-info?zip={ZIP}
. I managed to write the code to crawl a single web page using Scalpel.
But how would I go about scraping the 2500 items? In Elixir I map over the list of postal codes and after each page request there is a short sleep of 1 second, just to ease off pressure on the targeted website. It is not important to me to scrape the website as fast as possible.
How would I do this in Haskell? I read about threadSleep
but how do I use that in combination of the list to traverse and the main method, since the sleep is side effect.
Thanks for the insights!
Upvotes: 2
Views: 111
Reputation: 10645
Presumably you already have a function like:
scrapeZip :: Zip -> IO ZipResult
Then you can write a function with traverse
to get an IO action that returns a list of zip results:
scrapeZips :: [Zip] -> IO [ZipResult]
scrapeZips zipCodes = traverse scrapeZip zipCodes
But you want to add a delay, which can be done using threadDelay
(you can import it from Control.Concurrent
):
scrapeZipDelay :: Zip -> IO ZipResult
scrapeZipDelay zip = do
x <- scrapeZip zip
threadDelay 1000000 -- one second in microseconds
return x
And then you can use this scrapeZipDelay
with traverse
:
scrapeZipsDelay :: [Zip] -> IO [ZipResult]
scrapeZipsDelay zipCodes = traverse scrapeZipDelay zipCodes
Instead of defining a whole new scrapeZipDelay
function you can also write a pretty small version with the <*
operator:
scrapeZipsDelay :: [Zip] -> IO [ZipResult]
scrapeZipsDelay zipCodes =
traverse (\zip -> scrapeZip zip <* threadDelay 1000000) zipCodes
Upvotes: 4