Sam Crawford
Sam Crawford

Reputation: 341

Alternatives to pdftohtml

I'm experimenting with pdftohtml but I'm finding that it's occasionally having difficulty parsing tables correctly. It's grouping the text from two columns into a single cell, which makes my attempts to parse the resulting data futile!

Note that this occurs only once or twice within a PDF and is quite unpredictable.

I've tried the latest versions of pdftohtml (including the 0.40a beta), but to no avail.

Is anyone aware of any Linux-compatible equivalents that might be worth trying?

Thanks,

Sam

Upvotes: 2

Views: 2120

Answers (1)

irth
irth

Reputation: 1716

pdf2htmlEX is the best pdf-to-html I've seen.

install: brew install pdf2htmlex

I had to use brew install -f pdf2htmlex

run example: pdf2htmlEX --embed cfijo --dest-dir 'your-directory' your.pdf

that creates a new directory with the .html and ref'd images

Upvotes: 1

Related Questions