rev
rev

Reputation: 683

How do I solve: pandoc: ...:hGetContents: invalid argument (invalid byte sequence)

I'm running the following code in terminal on Mac OSX 10.6.8:

find . -name \*.html -type f -exec pandoc -o {}.md {} \;

It parses some documents, but gives me this error on quite a few:

pandoc: ./Teaching/how_16825_make-lesson-book.html: hGetContents: invalid argument (invalid byte sequence)

Any idea how to fix this?

Upvotes: 5

Views: 5363

Answers (3)

PaulANormanNZ
PaulANormanNZ

Reputation: 191

Having the same problem I also see this is in the Pandoc README.html file:---

Pandoc uses the UTF-8 character encoding for both input and output. If your local character encoding is not UTF-8, you should pipe input and output through iconv:

iconv -t utf-8 input.txt | pandoc | iconv -f utf-8

Of course you may need iconv instqalled first (Mac Osx already has it I beleive) ...

http://gnuwin32.sourceforge.net/packages/libiconv.htm Gnu Win32

https://code.google.com/p/win-iconv/ Google Win-Iconv

Upvotes: 3

sinemetu1
sinemetu1

Reputation: 1726

As kadeix said, this is a character encoding issue. Modifying the charset declaration in the html didn't do anything for me.

In vim to solve this issue I used: :w ++enc=utf-8

Upvotes: 2

kadeix
kadeix

Reputation: 1

I get this error when i try to parse a file encoded in latin-1.

Try saving the file in utf-8 (and modify the charset declaration in the html code) before using pandoc when you get this error.

Upvotes: 0

Related Questions