Suleman
Suleman

Reputation: 21

how to extract structured informaion from pdf file in java

I need to extract table from pdf file , i know it is not stored in table format but i want to read student result from pdf in java , please help if anyone knows.... thanks

Upvotes: 2

Views: 3259

Answers (2)

mark stephens
mark stephens

Reputation: 3184

SOme PDF files contain PDF structured text (http://www.jpedal.org/PDFblog/2010/09/the-easy-way-to-discover-if-a-pdf-file-contains-structured-content/). If they do not, it is down to the heuristics of the parser to guess this and add structure.

The PdfBox developers did a lot of work on tables but it will never be perfect

Upvotes: 3

Mat
Mat

Reputation: 206899

You should use a PDF parser for that. Check out this list of open source PDF libraries for Java.

Upvotes: 3

Related Questions