broundee
broundee

Reputation: 283

Is there any java library for converting document from pdf to html?

Open source implementation will be preferred.

Upvotes: 6

Views: 4637

Answers (3)

dacracot
dacracot

Reputation: 22348

Try using PDFBox from the apache foundation.

Upvotes: 1

PhiLho
PhiLho

Reputation: 41142

Obviously, it isn't an easy task, PDF formatting is much richer than HTML's one (plus you must extract images and link them, etc.).
Simple text extraction is much simpler (although not trivial...).
I see in the sidebar of your question a similar question: Converting PDF to HTML with Python which points to a library (poppler, which is apparently written in C++, perhaps can be accessed with JNI/JNA) and to a related question which offers even more answers.

Upvotes: 2

Kablam
Kablam

Reputation: 2591

Only ones I know of have to be paid for.

BFO
JPedal

Upvotes: 1

Related Questions