ljs.dev
ljs.dev

Reputation: 4493

Pandoc returning Either type

I'm lost as to where to step next in debugging this issue after some searching of similar but not quite the same errors.

Relevant lines from this Haskell script

import Text.Pandoc (writePlain, readHtml, def, pandocVersion)

convert :: String -> String
convert = writePlain def . readHtml def

Are resulting in this error:

Main.hs:11:28: error:
    • Couldn't match type ‘Either
                             Text.Pandoc.Error.PandocError Text.Pandoc.Definition.Pandoc’
                     with ‘Text.Pandoc.Definition.Pandoc’
      Expected type: String -> Text.Pandoc.Definition.Pandoc
        Actual type: String
                     -> Either
                          Text.Pandoc.Error.PandocError Text.Pandoc.Definition.Pandoc
    • In the second argument of ‘(.)’, namely ‘readHtml def’
      In the expression: writePlain def . readHtml def
      In an equation for ‘convert’:
          convert = writePlain def . readHtml def

Environment details:

with pandoc having been cabal install'd

Thanks to answer, comment and a few hours of bashing head against wall, working solution as below:

import Network.HTTP.Conduit (simpleHttp)
import Data.Text.Lazy as TL
import Data.Text.Lazy.Encoding as TLE
import Text.Pandoc
import Text.Pandoc.Error
import Data.Set

htmlToPlainText :: String -> String
htmlToPlainText = writePlain (def {
    writerExtensions = Data.Set.filter (/= Ext_raw_html) (writerExtensions def)
  }) . handleError . readHtml def

main :: IO ()
main = do
    response <- simpleHttp "https://leonstafford.github.io"

    let body = TLE.decodeUtf8 ( response )
    let bodyAsString = TL.unpack ( body )

    putStrLn $ htmlToPlainText bodyAsString

Upvotes: 0

Views: 92

Answers (1)

Erik
Erik

Reputation: 957

Take a look at the types of the two functions you're trying to compose:

readHtml:: ReaderOptions Reader options -> String -> Either PandocError Pandoc

readHtml is an operation that can fail. To represent this, it returns either a PandocError or a valid Pandoc.

writePlain:: WriterOptions -> Pandoc -> String

writePlain expects a valid Pandoc only.

The program must handle both cases:

  1. readHtml returns a Left/error value
  2. readHtml returns a Right/valid value

This can be done in various ways, but for example:

import Text.Pandoc (writePlain, readHtml, def, pandocVersion)

convert :: String -> String
convert = case readHtml def of
  Left err -> show err
  Right doc -> writePlain def doc

Either a b is similar to Maybe a if you're familiar with that, except it can have extra information in the case of a failure. By convention, the Left constructor is used to represent an error value, and Right constructor is used to represent a normal value.

Upvotes: 2

Related Questions