Jonathan
Jonathan

Reputation: 11321

How can I download a file from the Internet using Haskell?

I'm just trying to do something similar to wget, where I download a file from the Internet. I saw that there used to be a package called http-wget, but that it's been deprecated in favor of http-conduit.

Http-conduit has a simple example for how to get the contents of a web page using httpBS. So following that, I got this to work:

{-# LANGUAGE OverloadedStrings #-}

import Network.HTTP.Simple
import qualified Data.ByteString.Char8 as B8

main :: IO ()
main = do
  let url = "https://www.example.com/sitemap.xml"
  resp <- httpBS url
  B8.putStrLn $ getResponseBody resp

And this works for getting the filename (sitemap.xml) from the URL:

{-# LANGUAGE OverloadedStrings #-}

import Network.HTTP.Simple
import qualified Data.ByteString.Char8 as B8

main :: IO ()
main = do
  let url = "https://www.example.com/sitemap.xml"
  let urlParts = B8.split '/' $ B8.pack url
  let fileName = Prelude.last urlParts
  B8.putStrLn fileName

But I can't put them together:

{-# LANGUAGE OverloadedStrings #-}

import Network.HTTP.Simple
import qualified Data.ByteString.Char8 as B8

main :: IO ()
main = do
  let url = "https://www.example.com/sitemap.xml"
  let urlParts = B8.split '/' $ B8.pack url
  let fileName = Prelude.last urlParts
  resp <- httpBS url
  B8.putStrLn $ getResponseBody resp

That gives the error:

ny1920-parse.hs:12:41: error:
    • Couldn't match type ‘Request’ with ‘[Char]’
      Expected type: String
        Actual type: Request
    • In the first argument of ‘B8.pack’, namely ‘url’
      In the second argument of ‘($)’, namely ‘B8.pack url’
      In the expression: B8.split '/' $ B8.pack url
   |
12 |   let urlParts = B8.split '/' $ B8.pack url
   |                                         ^^^

So I just need to convert String -> Request? There's apparently a function for that in http-conduit, but it doesn't work as expected—I still get the same error.

I can force the URL to be a Request like this:

  let url = "https://www.example.com/sitemap.xml" :: Request

But then of course that breaks the part where I break up the filename, because it expects a [Char] and not a Request.

So I'm stuck—if I make the URL a String, it breaks http-conduit. And if I make it a Request, it breaks the string manipulation.

I feel like something this simple shouldn't be this hard, no?

Edit: Ok, so I can almost get it to work with this addition:

  let urlParts = B8.split '/' $ B8.pack (show url)

That compiles, but it makes the filename corrupt. Trying to print out the filename gives: "1.1\n}\n" instead of sitemap.xml.

Upvotes: 3

Views: 708

Answers (2)

EDIT: Looking at this again 2 months later, it's really just the monomorphism restriction that keeps the original code from working. I'll leave this here as a more specific workaround for this case, as opposed to the general workarounds for the monomorphism restiction.


Try this:

{-# LANGUAGE OverloadedStrings #-}

import Network.HTTP.Simple
import qualified Data.ByteString.Char8 as B8

main :: IO ()
main = do
  let url = "https://www.example.com/sitemap.xml"
  let urlParts = B8.split '/' $ B8.pack url
  let fileName = Prelude.last urlParts
  req <- parseRequest url
  resp <- httpBS req
  B8.putStrLn $ getResponseBody resp

You can't have url simultaneously be a String for B8.pack and a Request for httpBS. By calling parseRequest manually to get a Request instead of letting the IsString instance do it, we now have url :: String and req :: Request.

Upvotes: 3

Daniel Wagner
Daniel Wagner

Reputation: 153102

I'm going to disagree with the other answer here: splitting on / yourself is a bad idea. Don't try to implement an ad-hoc URL parser; it's way harder than you think. Instead, re-use the parse that you already have:

{-# LANGUAGE OverloadedStrings #-}

import Network.HTTP.Client
import Network.HTTP.Simple
import Network.URI
import qualified Data.ByteString.Char8 as B8

main :: IO ()
main = do
    let request = "https://www.example.com/sitemap.xml"
        fileName = Prelude.last . pathSegments . getUri $ request
    resp <- httpBS request
    B8.putStrLn $ getResponseBody resp

See the documentation for more on the parts you can extract from a URI.

Upvotes: 5

Related Questions