Reputation: 683
I'm running the following code in terminal on Mac OSX 10.6.8:
find . -name \*.html -type f -exec pandoc -o {}.md {} \;
It parses some documents, but gives me this error on quite a few:
pandoc: ./Teaching/how_16825_make-lesson-book.html: hGetContents: invalid argument (invalid byte sequence)
Any idea how to fix this?
Upvotes: 5
Views: 5363
Reputation: 191
Having the same problem I also see this is in the Pandoc README.html file:---
Pandoc uses the UTF-8 character encoding for both input and output. If your local character encoding is not UTF-8, you should pipe input and output through iconv:
iconv -t utf-8 input.txt | pandoc | iconv -f utf-8
Of course you may need iconv instqalled first (Mac Osx already has it I beleive) ...
http://gnuwin32.sourceforge.net/packages/libiconv.htm Gnu Win32
https://code.google.com/p/win-iconv/ Google Win-Iconv
Upvotes: 3
Reputation: 1726
As kadeix said, this is a character encoding issue. Modifying the charset declaration in the html didn't do anything for me.
In vim to solve this issue I used: :w ++enc=utf-8
Upvotes: 2
Reputation: 1
I get this error when i try to parse a file encoded in latin-1.
Try saving the file in utf-8 (and modify the charset declaration in the html code) before using pandoc when you get this error.
Upvotes: 0