Reputation: 341
I'm experimenting with pdftohtml but I'm finding that it's occasionally having difficulty parsing tables correctly. It's grouping the text from two columns into a single cell, which makes my attempts to parse the resulting data futile!
Note that this occurs only once or twice within a PDF and is quite unpredictable.
I've tried the latest versions of pdftohtml (including the 0.40a beta), but to no avail.
Is anyone aware of any Linux-compatible equivalents that might be worth trying?
Thanks,
Sam
Upvotes: 2
Views: 2120
Reputation: 1716
pdf2htmlEX is the best pdf-to-html I've seen.
install: brew install pdf2htmlex
I had to use brew install -f pdf2htmlex
run example: pdf2htmlEX --embed cfijo --dest-dir 'your-directory' your.pdf
that creates a new directory with the .html and ref'd images
Upvotes: 1