HamidIng
HamidIng

Reputation: 105

split pdf to multiple html file with pdf2htmlEX

I'm trying to split a PDF file into separate HTML files. I mean for each PDF page I want an HTML file. This is how I do it:

pdf2htmlEX --split-pages 1 LMS.pdf --page-filename lms%03.html

In the result I got an empty LMS.html and other files: lms%031.html, lms%032.html. The problem is that those html files are not correctly formatted, no CSS style?

Upvotes: 1

Views: 1812

Answers (1)

Daniel Bidulock
Daniel Bidulock

Reputation: 2354

Funny thing about that... I stumbled across your question while trying to solve an identical problem. I used the same command as yours, except without setting the --page-filename parameter. Using your example, my pdf2htmlEX call would be analogous to:

pdf2htmlEX --split-pages 1 LMS.pdf 

Then I opened up the main HTML file in Chrome to find a bunch of blank pages. After searching around a bit, I opened up the same file in Firefox. It worked. Very strange. No errors reported in the console output. Of course, I didn't even think to look in the Chrome console output. When I did I found:

Uncaught NetworkError: Failed to execute 'send' on 'XMLHttpRequest': Failed to load 'file:///...'.

Thank God for StackOverflow. I don't know why it works in Firefox, but if you're getting the errors reported by Chrome, you need to be running a web server.

The easiest and fastest way for me to do this was to change into the directory in which I converted the PDF and run:

python -m SimpleHTTPServer

By default, your page will be served up at http://localhost:8000. Problem solved. Use whatever server suits you best.

Upvotes: 3

Related Questions